Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

The Scientific World JOURNAL ◽

10.1100/2012/365104 ◽

2012 ◽

Vol 2012 ◽

pp. 1-10 ◽

Cited By ~ 19

Author(s):

Chun-Tien Chang ◽

Chi-Neu Tsai ◽

Chuan Yi Tang ◽

Chun-Houh Chen ◽

Jang-Hau Lian ◽

...

Keyword(s):

Dna Sequences ◽

Copy Number ◽

Tandem Repeats ◽

Direct Sequencing ◽

Nucleotide Polymorphisms ◽

Dna Index ◽

Paralogous Genes ◽

Base Calling ◽

Mixed Sequence ◽

Reference Sequences

The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such asβ-defensin 4 (DEFB4) and its paralogHSPDP3.

Download Full-text

Inverted duplicate DNA sequences increase translocation rates through sequencing nanopores resulting in reduced base calling accuracy

Nucleic Acids Research ◽

10.1093/nar/gkaa206 ◽

2020 ◽

Vol 48 (9) ◽

pp. 4940-4945

Author(s):

Pieter Spealman ◽

Jaden Burrell ◽

David Gresham

Keyword(s):

Dna Sequences ◽

Copy Number ◽

Copy Number Variants ◽

Electrical Current ◽

Nanopore Sequencing ◽

Aberrant Behavior ◽

Structural Variants ◽

Dna Structures ◽

Base Calling ◽

Secondary Dna Structures

Abstract Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.

Download Full-text

A Linear Algebra Approach to Fast DNA Mixture Analysis Using GPUs

10.1101/174813 ◽

2017 ◽

Author(s):

Siddharth Samsi ◽

Brian Helfer ◽

Jeremy Kepner ◽

Albert Reuther ◽

Darrell O. Ricke

Keyword(s):

Linear Algebra ◽

Dna Sequences ◽

Tandem Repeats ◽

Matrix Multiplication ◽

Forensic Analysis ◽

Dense Matrix ◽

Nucleotide Polymorphisms ◽

Base Pairs ◽

Algebra Approach ◽

Speed Up

AbstractAnalysis of DNA samples is an important tool in forensics, and the speed of analysis can impact investigations. Comparison of DNA sequences is based on the analysis of short tandem repeats (STRs), which are short DNA sequences of 2-5 base pairs. Current forensics approaches use 20 STR loci for analysis. The use of single nucleotide polymorphisms (SNPs) has utility for analysis of complex DNA mixtures. The use of tens of thousands of SNPs loci for analysis poses significant computational challenges because the forensic analysis scales by the product of the loci count and number of DNA samples to be analyzed. In this paper, we discuss the implementation of a DNA sequence comparison algorithm by re-casting the algorithm in terms of linear algebra primitives. By developing an overloaded matrix multiplication approach to DNA comparisons, we can leverage advances in GPU hardware and algoithms for dense matrix multiplication (DGEMM) to speed up DNA sample comparisons. We show that it is possible to compare 2048 unknown DNA samples with 20 million known samples in under 6 seconds using a NVIDIA K80 GPU.

Download Full-text

Variation among alfalfa somaclones in copy number of repeated DNA sequences

Genome ◽

10.1139/g93-119 ◽

1993 ◽

Vol 36 (5) ◽

pp. 906-912 ◽

Cited By ~ 10

Author(s):

K. K. Kidwell ◽

T. C. Osborn

Keyword(s):

Dna Sequences ◽

Copy Number ◽

Southern Blot Analysis ◽

Tandem Repeats ◽

Nuclear Dna ◽

Slot Blot ◽

Repeated Dna ◽

Repeated Dna Sequences ◽

Hybridization Intensity ◽

Medicago Sativa L

Repeated DNA sequences of alfalfa (Medicago sativa L.) somaclonal variants were analyzed to determine if changes in copy number had occurred during tissue culture. DNA clones containing highly repeated nuclear sequences from the diploid line HG2 (2x = 16) were slot blotted and probed with labeled DNAs from HG2 and several somaclones of HG2. Two DNA clones that differed visually in hybridization intensity among the plant DNAs and one clone that had constant hybridization intensity were selected and used as probes on Southern blots and slot blots containing equal quantities of DNAs from HG2 and 15 régénérants. Statistically significant differences were detected in the copy number of two anonymous DNA sequences initially selected as variable and in the copy number of sequences homologous to pea ribosomal DNA. Based on Southern blot analysis, these sequences appeared to be arranged as tandem repeats. The cloned sequence initially selected as stable did not vary significantly in copy number and it appeared to be arranged as a dispersed repeat. Both increases and decreases in copy number of repeated sequences were observed in plants from successive regeneration cycles. Results from this study indicate that specific repeated nuclear DNA sequences have changed copy number in plants regenerated from tissue culture.Key words: somaclonal variation, repeated DNA, slot blot, quantitative variation.

Download Full-text

Identification of single nucleotide polymorphisms in GDF9 gene associated with litter size in Garut sheep

Indonesian Journal of Biotechnology ◽

10.22146/ijbiotech.42095 ◽

2019 ◽

Vol 24 (1) ◽

pp. 51 ◽

Cited By ~ 1

Author(s):

Resti Yuliana Rahmawati ◽

Sumadi Sumadi ◽

Tety Hartatik

Keyword(s):

Single Nucleotide Polymorphisms ◽

Litter Size ◽

Direct Sequencing ◽

Pcr Amplification ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Growth Differentiation Factor 9 ◽

Hardy Weinberg Equilibrium ◽

Reference Sequences ◽

Gdf9 Gene

The growth differentiation factor 9 (GDF9) gene has been regarded as having major impacts on ovulation rate and litter size in sheep. The aim of this study was to identify the single nucleotide polymorphisms (SNPs) of the GDF9 gene and their association with litter size in Garut sheep. For this purpose, a total of 60 ewes of Garut sheep were included in this study. Based on the sheep GDF9 reference sequences (Genbank Acc. No. AF078545.2), one pair of primers (5’-CTGCTGTTTAACCTGGATCGTG-3 5’-GGAGAGCCATACCGATGTCC-3 as forward and reverse, respectively) was used for PCR amplification. The results revealed that four SNPs (g.54C>T, g.60G>A, g.304G>A, and g.333G>A) were found in Garut sheep by direct sequencing. For SNP g.54C>T, the sheep exhibited the highest frequency of allele C and genotype CC. On the other hand, SNPs g.60G>A, g.304G>A, and g.333G>A showed a higher frequency of allele G than allele A, and the GG genotype was predominant in the population. SNP g.333G>A had a significant effect on litter size (p < 0.05), and ewes with the GG genotype had a higher litter size than those with the GA genotype. Genotype distributions for all identified SNPs were in agreement with Hardy-Weinberg equilibrium. We highlight that SNP g.333G>A may be useful as a genetic marker for litter size in Garut sheep.

Download Full-text

INTERVERTEBRAL DISC DEGENERATION LINKED TO STRUCTURAL GENE VARIATIONS

Pakistan Journal of Medicine and Dentistry ◽

10.36283/pjmd8-4/012 ◽

2019 ◽

Author(s):

Saeeda Baig

Keyword(s):

Intervertebral Disc ◽

Disc Degeneration ◽

Intervertebral Disc Degeneration ◽

English Language ◽

Tandem Repeats ◽

Lumbar Disc ◽

Disease Process ◽

Nucleotide Polymorphisms ◽

Structural Protein ◽

Lumbar Disc Degeneration

During the recent past focus has shifted from identifying intervertebral disc degeneration as being caused by physical exposure and strain to being linked with a variety of genetic variations. The objective of this review is to provide an up to date review of the existing research data regarding the relation of intervertebral disc degeneration to structural protein genes and their polymorphisms and thus help clearly establish further avenues where research into causation and treatment is needed. A comprehensive search using the keywords “Collagen”, “COL”, “Aggrecan”, “AGC”, “IVDD”, “intervertebral disc degeneration”, and “lumbar disc degeneration” from PubMed and Google Scholar, where literature in the English language was selected spanning from 1991 to 2019. There are many genes involved in the production of structural components of an intervertebral disc. The issues in production of these components involve the over-expression or under-expression of their genes, and single nucleotide polymorphisms and variable number of tandem repeats affecting their structures. These structural genes include primarily the collagen and the aggrecan genes. While genetic and environmental factors all come into play with a disease process like disc degeneration, the bulk of research now shows the significantly larger impact of hereditary over exposure. While further research is needed into some of the lesser studied genes linked to IVDD and also the racial variations in genetic makeup, the focus in the near future should be on establishment of genetic testing to identify individuals at greater risk of disease and deliberation regarding the use of gene therapy to prevent disc degeneration.

Download Full-text

Nucleotide polymorphisms in three genes support host and geographic speciation in tree pathogens belonging toGremmeniellaspp.

Canadian Journal of Botany ◽

10.1139/b02-103 ◽

2002 ◽

Vol 80 (11) ◽

pp. 1151-1159 ◽

Cited By ~ 8

Author(s):

M Dusabenyagasani ◽

G Laflamme ◽

R C Hamelin

Keyword(s):

North America ◽

Dna Sequences ◽

North American ◽

Abies Balsamea ◽

Host Specialization ◽

Rrna Genes ◽

Nucleotide Polymorphisms ◽

Group I ◽

Geographic Separation ◽

Pinus Spp

We detected nucleotide polymorphisms within the genus Gremmeniella in DNA sequences of β-tubulin, glyceraldehyde phosphate dehydrogenase, and mitochondrial small subunit rRNA (mtSSU rRNA) genes. A group-I intron was present in strains originating from fir (Abies spp.) in the mtSSU rRNA locus. This intron in the mtSSU rRNA locus of strains isolated from Abies sachalinensis (Fridr. Schmidt) M.T. Mast in Asia was also found in strains isolated from Abies balsamea (L.) Mill. in North America. Phylogenetic analyses yielded trees that grouped strains by host of origin with strong branch support. Asian strains of Gremmeniella abietina (Lagerberg) Morelet var. abietina isolated from fir (A. sachalinensis) were more closely related to G. abietina var. balsamea from North America, which is found on spruce (Picea spp.) and balsam fir, and European and North American races of G. abietina var. abietina from pines (Pinus spp.) were distantly related. Likewise, North American isolates of Gremmeniella laricina (Ettinger) O. Petrini, L.E. Petrini, G. Laflamme, & G.B. Ouellette, a pathogen of larch, was more closely related to G. laricina from Europe than to G. abietina var. abietina from North America. These data suggest that host specialization might have been the leading evolutionary force shaping Gremmeniella spp., with geographic separation acting as a secondary factor.Key words: Gremmeniella, geographic separation, host specialization, mitochondrial rRNA, nuclear genes.

Download Full-text

Estimating Copy-Number Proportions: The Comeback of Sanger Sequencing

Genes ◽

10.3390/genes12020283 ◽

2021 ◽

Vol 12 (2) ◽

pp. 283

Author(s):

Eyal Seroussi

Keyword(s):

Copy Number ◽

Sanger Sequencing ◽

Cytosine Methylation ◽

Direct Sequencing ◽

Information Source ◽

Gene Copy Number ◽

Cost Effective ◽

Gene Copy ◽

Base Editing ◽

Recent Developments

Determination of the relative copy numbers of mixed molecular species in nucleic acid samples is often the objective of biological experiments, including Single-Nucleotide Polymorphism (SNP), indel and gene copy-number characterization, and quantification of CRISPR-Cas9 base editing, cytosine methylation, and RNA editing. Standard dye-terminator chromatograms are a widely accessible, cost-effective information source from which copy-number proportions can be inferred. However, the rate of incorporation of dye terminators is dependent on the dye type, the adjacent sequence string, and the secondary structure of the sequenced strand. These variable rates complicate inferences and have driven scientists to resort to complex and costly quantification methods. Because these complex methods introduce their own biases, researchers are rethinking whether rectifying distortions in sequencing trace files and using direct sequencing for quantification will enable comparable accurate assessment. Indeed, recent developments in software tools (e.g., TIDE, ICE, EditR, BEEP and BEAT) indicate that quantification based on direct Sanger sequencing is gaining in scientific acceptance. This commentary reviews the common obstacles in quantification and the latest insights and developments relevant to estimating copy-number proportions based on direct Sanger sequencing, concluding that bidirectional sequencing and sophisticated base calling are the keys to identifying and avoiding sequence distortions.

Download Full-text

Heteroalleles in Common Wheat: Multiple Differences between Allelic Variants of the Gli-B1 Locus

International Journal of Molecular Sciences ◽

10.3390/ijms22041832 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1832

Author(s):

Eugene Metakovsky ◽

Laura Pascual ◽

Patrizia Vaccino ◽

Viktor Melnik ◽

Marta Rodriguez-Quijano ◽

...

Keyword(s):

Common Wheat ◽

Dna Sequences ◽

Fragment Length Polymorphism ◽

Snp Markers ◽

Group Iv ◽

Nucleotide Polymorphisms ◽

High Genetic Diversity ◽

Single Nucleotide ◽

Allelic Variants ◽

B Genome

The Gli-B1-encoded γ-gliadins and non-coding γ-gliadin DNA sequences for 15 different alleles of common wheat have been compared using seven tests: electrophoretic mobility (EM) and molecular weight (MW) of the encoded major γ-gliadin, restriction fragment length polymorphism patterns (RFLPs) (three different markers), Gli-B1-γ-gliadin-pseudogene known SNP markers (Single nucleotide polymorphisms) and sequencing the pseudogene GAG56B. It was discovered that encoded γ-gliadins, with contrasting EM, had similar MWs. However, seven allelic variants (designated from I to VII) differed among them in the other six tests: I (alleles Gli-B1i, k, m, o), II (Gli-B1n, q, s), III (Gli-B1b), IV (Gli-B1e, f, g), V (Gli-B1h), VI (Gli-B1d) and VII (Gli-B1a). Allele Gli-B1c (variant VIII) was identical to the alleles from group IV in four of the tests. Some tests might show a fine difference between alleles belonging to the same variant. Our results attest in favor of the independent origin of at least seven variants at the Gli-B1 locus that might originate from deeply diverged genotypes of the donor(s) of the B genome in hexaploid wheat and therefore might be called “heteroallelic”. The donor’s particularities at the Gli-B1 locus might be conserved since that time and decisively contribute to the current high genetic diversity of common wheat.

Download Full-text

Meiotic Recombination Between Paralogous RBCSB Genes on Sister Chromatids of Arabidopsis thaliana

Genetics ◽

10.1093/genetics/166.2.947 ◽

2004 ◽

Vol 166 (2) ◽

pp. 947-957 ◽

Cited By ~ 1

Author(s):

John G Jelesko ◽

Kristy Carter ◽

Whitney Thompson ◽

Yuki Kinoshita ◽

Wilhelm Gruissem

Keyword(s):

Gene Cluster ◽

Dna Sequences ◽

Meiotic Recombination ◽

Sequence Similarity ◽

Specific Gene ◽

High Sequence Similarity ◽

Paralogous Genes ◽

Chimeric Genes ◽

Unequal Recombination ◽

Sister Chromatids

Abstract Paralogous genes organized as a gene cluster can rapidly evolve by recombination between misaligned paralogs during meiosis, leading to duplications, deletions, and novel chimeric genes. To model unequal recombination within a specific gene cluster, we utilized a synthetic RBCSB gene cluster to isolate recombinant chimeric genes resulting from meiotic recombination between paralogous genes on sister chromatids. Several F1 populations hemizygous for the synthRBCSB1 gene cluster gave rise to Luc+ F2 plants at frequencies ranging from 1 to 3 × 10-6. A nonuniform distribution of recombination resolution sites resulted in the biased formation of recombinant RBCS3B/1B::LUC genes with nonchimeric exons. The positioning of approximately half of the mapped resolution sites was effectively modeled by the fractional length of identical DNA sequences. In contrast, the other mapped resolution sites fit an alternative model in which recombination resolution was stimulated by an abrupt transition from a region of relatively high sequence similarity to a region of low sequence similarity. Thus, unequal recombination between paralogous RBCSB genes on sister chromatids created an allelic series of novel chimeric genes that effectively resulted in the diversification rather than the homogenization of the synthRBCSB1 gene cluster.

Download Full-text

Micronuclear DNA sequences from Tetrahymena do not confer mitotic stability on ARS plasmids in Saccharomyces

Genome ◽

10.1139/g88-116 ◽

1988 ◽

Vol 30 (5) ◽

pp. 690-696 ◽

Cited By ~ 3

Author(s):

Wendy H. Horsfall ◽

Ronald E. Pearlman

Keyword(s):

Saccharomyces Cerevisiae ◽

Dna Sequences ◽

Copy Number ◽

Tetrahymena Thermophila ◽

Assay System ◽

Mitotic Stability ◽

Yeast Saccharomyces Cerevisiae ◽

Genomic Libraries ◽

Ade2 Gene

Genomic libraries containing micronuclear DNA sequences from Tetrahymena thermophila have been constructed in a vector containing ARS1, SUP11, and ura3 sequences from the yeast Saccharomyces cerevisiae. When transformed into a strain of S. cerevisiae carrying a suppressible ochre mutation in the ade2 gene, viable transformants are obtained only if the transforming plasmid is maintained at a copy number of one or two per cell. Mitotic segregation of the plasmid is easily assessed in a colour assay of transformants. Using this assay system, we showed that micronuclear DNA from Tetrahymena does not contain sequences that confer mitotic stability on yeast ARS-containing plasmids; i.e., sequences that function analogously to yeast centromere sequences. One transformant was analyzed that carries Tetrahymena sequences that maintain the copy number of the ARS plasmid at one or two per cell. However, these sequences do not confer mitotic stability on the transformants and they confer a phenotype in this assay similar to that of the REP3 gene of the yeast 2 μm plasmid.Key words: mitotic stability, centromere, Tetrahymena, Saccharomyces.

Download Full-text