Mining SNPs From EST Databases

Leslie Picoult-Newberg; Trey E. Ideker; Mark G. Pohl; Scott L. Taylor; Miriam A. Donaldson; Deborah A. Nickerson; Michael Boyce-Jacino

doi:10.1101/gr.9.2.167

Mining SNPs From EST Databases

Genome Research ◽

10.1101/gr.9.2.167 ◽

1999 ◽

Vol 9 (2) ◽

pp. 167-174 ◽

Cited By ~ 12

Author(s):

Leslie Picoult-Newberg ◽

Trey E. Ideker ◽

Mark G. Pohl ◽

Scott L. Taylor ◽

Miriam A. Donaldson ◽

...

Keyword(s):

Expressed Sequence Tag ◽

De Novo ◽

Cdna Libraries ◽

Human Populations ◽

Data Sets ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Using Data

There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs.[The SNPs identified in this study can be found in the National Center of Biotechnology (NCBI) SNP database under submitter handles ORCHID (SNPS-981210-A) and debnick (SNPS-981209-A and SNPS-981209-B).]

Download Full-text

EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Twin Research and Human Genetics ◽

10.1017/thg.2014.25 ◽

2014 ◽

Vol 17 (4) ◽

Cited By ~ 2

Author(s):

Raymond K. Walters ◽

Charles Laurin ◽

Gitta H. Lubke

Keyword(s):

Power Analysis ◽

R Package ◽

Simulation Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Epistatic Interactions ◽

Model Interpretation ◽

Genome Wide ◽

Using Data ◽

Power Analyses

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.

Download Full-text

From raw reads to trees: Whole genome SNP phylogenetics across the tree of life

10.1101/032250 ◽

2015 ◽

Cited By ~ 10

Author(s):

Sanaa Afroz Ahmed ◽

Chien-Chi Lo ◽

Po-E Li ◽

Karen W Davenport ◽

Patrick S.G. Chain

Keyword(s):

Ad Hoc ◽

Phylogenetic Reconstruction ◽

Clinical Samples ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Complex Samples ◽

Phylogenetic Characterization ◽

Genome Wide ◽

Genome Assemblies

Next-generation sequencing is increasingly being used to examine closely related organisms. However, while genome-wide single nucleotide polymorphisms (SNPs) provide an excellent resource for phylogenetic reconstruction, to date evolutionary analyses have been performed using different ad hoc methods that are not often widely applicable across different projects. To facilitate the construction of robust phylogenies, we have developed a method for genome-wide identification/characterization of SNPs from sequencing reads and genome assemblies. Our phylogenetic and molecular evolutionary (PhaME) analysis software is unique in its ability to take reads and draft/complete genome(s) as input, derive core genome alignments, identify SNPs, construct phylogenies and perform evolutionary analyses. Several examples using genomes and read datasets for bacterial, eukaryotic and viral linages demonstrate the broad and robust functionality of PhaME. Furthermore, the ability to incorporate raw metagenomic reads from clinical samples with suspected infectious agents shows promise for the rapid phylogenetic characterization of pathogens within complex samples.

Download Full-text

Genome-wide identification, phylogenetic and expression pattern analysis of GATA family genes in Brassica napus

BMC Plant Biology ◽

10.1186/s12870-020-02752-2 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Weizhuo Zhu ◽

Yiyi Guo ◽

Yeke Chen ◽

Dezhi Wu ◽

Lixi Jiang

Keyword(s):

Brassica Napus ◽

Stress Condition ◽

Expression Patterns ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Good Opportunity ◽

Domain Structures ◽

Genome Wide ◽

A Genome

Abstract Background Transcription factors GATAs are involved in plant developmental processes and respond to environmental stresses through binding DNA regulatory regions to regulate their downstream genes. However, little information on the GATA genes in Brassica napus is available. The release of the reference genome of B. napus provides a good opportunity to perform a genome-wide characterization of GATA family genes in rapeseed. Results In this study, 96 GATA genes randomly distributing on 19 chromosomes were identified in B. napus, which were classified into four subfamilies based on phylogenetic analysis and their domain structures. The amino acids of BnGATAs were obvious divergence among four subfamilies in terms of their GATA domains, structures and motif compositions. Gene duplication and synteny between the genomes of B. napus and A. thaliana were also analyzed to provide insights into evolutionary characteristics. Moreover, BnGATAs showed different expression patterns in various tissues and under diverse abiotic stresses. Single nucleotide polymorphisms (SNPs) distributions of BnGATAs in a core collection germplasm are probably associated with functional disparity under environmental stress condition in different genotypes of B. napus. Conclusion The present study was investigated genomic structures, evolution features, expression patterns and SNP distributions of 96 BnGATAs. The results enrich our understanding of the GATA genes in rapeseed.

Download Full-text

Confirming Single Nucleotide Polymorphisms from Expressed Sequence Tag Datasets Derived from Three Cattle cDNA Libraries

BMB Reports ◽

10.5483/bmbrep.2006.39.2.183 ◽

2006 ◽

Vol 39 (2) ◽

pp. 183-188 ◽

Cited By ~ 7

Author(s):

Seung-Hwan Lee ◽

Eung-Woo Park ◽

Yong-Min Cho ◽

Ji-Woong Lee ◽

Hyoung-Yong Kim ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Expressed Sequence Tag ◽

Cdna Libraries ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Expressed Sequence

Download Full-text

Chromosome-Wide Characterization of Intragenic Crossover in Shiitake Mushroom, Lentinula edodes

Journal of Fungi ◽

10.3390/jof7121076 ◽

2021 ◽

Vol 7 (12) ◽

pp. 1076

Author(s):

Wenbing Gong ◽

Nan Shen ◽

Lin Zhang ◽

Yinbing Bian ◽

Yang Xiao

Keyword(s):

Lentinula Edodes ◽

Critical Role ◽

Central Component ◽

Nucleotide Polymorphisms ◽

Single Spore ◽

Single Nucleotide ◽

Genome Wide ◽

Cis And Trans ◽

Qtls Mapping

Meiotic crossover plays a critical role in generating genetic variations and is a central component of breeding. However, our understanding of crossover in mushroom-forming fungi is limited. Here, in Lentinula edodes, we characterized the chromosome-wide intragenic crossovers, by utilizing the single-nucleotide polymorphisms (SNPs) datasets of an F1 haploid progeny. A total of 884 intragenic crossovers were identified in 110 single-spore isolates, the majority of which were closer to transcript start sites. About 71.5% of the intragenic crossovers were clustered into 65 crossover hotspots. A 10 bp motif (GCTCTCGAAA) was significantly enriched in the hotspot regions. Crossover frequencies around mating-type A (MAT-A) loci were enhanced and formed a hotspot in L. edodes. Genome-wide quantitative trait loci (QTLs) mapping identified sixteen crossover-QTLs, contributing 8.5–29.1% of variations. Most of the detected crossover-QTLs were co-located with crossover hotspots. Both cis- and trans-QTLs contributed to the nonuniformity of crossover along chromosomes. On chr2, we identified a QTL hotspot that regulated local, global crossover variation and crossover hotspot in L. edodes. These findings and observations provide a comprehensive view of the crossover landscape in L. edodes, and advance our understandings of conservation and diversity of meiotic recombination in mushroom-forming fungi.

Download Full-text

Genome-wide identification, phylogenetic and expression pattern analysis of GATA family genes in Brassica napus

10.21203/rs.3.rs-30607/v3 ◽

2020 ◽

Author(s):

Weizhuo Zhu ◽

Yiyi Guo ◽

Yeke Chen ◽

Dezhi Wu ◽

Lixi Jiang

Keyword(s):

Brassica Napus ◽

Stress Condition ◽

Expression Patterns ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Good Opportunity ◽

Domain Structures ◽

Genome Wide ◽

A Genome

Abstract Background: Transcription factors GATAs are involved in plant developmental processes and respond to environmental stresses through binding DNA regulatory regions to regulate their downstream genes. However, little information on the GATA genes in Brassica napus is available. The release of the reference genome of B. napus provides a good opportunity to perform a genome-wide characterization of GATA family genes in rapeseed.Results: In this study, 96 GATA genes randomly distributing on 19 chromosomes were identified in B. napus, which were classified into four subfamilies based on phylogenetic analysis and their domain structures. The amino acids of BnGATAs were obvious divergence among four subfamilies in terms of their GATA domains, structures and motif compositions. Gene duplication and synteny between the genomes of B. napus and A. thaliana were also analyzed to provide insights into evolutionary characteristics. Moreover, BnGATAs showed different expression patterns in various tissues and under diverse abiotic stresses. Single nucleotide polymorphisms (SNPs) distributions of BnGATAs in a core collection germplasm are probably associated with functional disparity under environmental stress condition in different genotypes of B. napus.Conclusion: The present study was investigated genomic structures, evolution features, expression patterns and SNP distributions of 96 BnGATAs. The results enrich our understanding of the GATA genes in rapeseed.

Download Full-text

Epi2Loc: An R Package to Investigate Two-Locus Epistatic Models

Twin Research and Human Genetics ◽

10.1017/thg.2014.38 ◽

2014 ◽

Vol 17 (4) ◽

pp. 272-278 ◽

Cited By ~ 1

Author(s):

Raymond K. Walters ◽

Charles Laurin ◽

Gitta H. Lubke

Keyword(s):

Power Analysis ◽

R Package ◽

Simulation Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Epistatic Interactions ◽

Model Interpretation ◽

Genome Wide ◽

Using Data ◽

Power Analyses

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, Epi2Loc, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. Epi2Loc facilitates research on SNP–SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.

Download Full-text

Candidate-gene association study searching for genetic factors involved in migraine chronification

Cephalalgia ◽

10.1177/0333102414547141 ◽

2014 ◽

Vol 35 (6) ◽

pp. 500-507 ◽

Cited By ~ 9

Author(s):

MA Louter ◽

J Fernandez-Morales ◽

B de Vries ◽

B Winsvold ◽

V Anttila ◽

...

Keyword(s):

Association Study ◽

Genetic Factors ◽

Data Sets ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Candidate Gene Association ◽

Genome Wide ◽

Third Stage ◽

Genome Wide Data ◽

Two Stages

Introduction Chronic migraine (CM) is at the severe end of the clinical migraine spectrum, but its genetic background is unknown. Our study searched for evidence that genetic factors are involved in the chronification process. Methods We initially selected 144 single-nucleotide polymorphisms (SNPs) from 48 candidate genes, which we tested for association in two stages: The first stage encompassed 262 CM patients, the second investigated 226 patients with high-frequency migraine (HFM). Subsequently, SNPs with p values < 0.05 were forwarded to the replication stage containing 531 patients with CM or HFM. Results Eight SNPs were significantly associated with CM and HFM in the two-stage phase. None survived replication in the third stage. Discussion We present the first comprehensive genetic association study for migraine chronification. There were no significant findings. Future studies may benefit from larger, genome-wide data sets or should use other genetic approaches to identify genetic factors involved in migraine chronification.

Download Full-text

Comparative bioinformatic analysis of genes expressed in common bean (Phaseolus vulgaris L.) seedlings

Genome ◽

10.1139/g05-010 ◽

2005 ◽

Vol 48 (3) ◽

pp. 562-570 ◽

Cited By ~ 31

Author(s):

Maeli Melotto ◽

Claudia B Monteiro-Vitorello ◽

Adriano G Bruschi ◽

Luis E.A Camargo

Keyword(s):

Phaseolus Vulgaris ◽

Common Bean ◽

Expressed Sequence Tag ◽

Gene Families ◽

Bioinformatic Analysis ◽

Colletotrichum Lindemuthianum ◽

Cdna Libraries ◽

Nucleotide Polymorphisms ◽

Phaseolus Vulgaris L ◽

Single Nucleotide

To rapidly and cost-effectively generate gene expression data, we developed an annotated unigene database of common bean (Phaseolus vulgaris L.). In this study, 3 cDNA libraries were constructed from the bean breeding line SEL1308, 1 from young leaf and 2 from seedlings inoculated or not inoculated with the fungal pathogen Colletotrichum lindemuthianum (Sacc. & Magnus) Briosi & Cavara, which causes anthracnose in common bean. To this date, 5255 single-pass sequences have been included in the database after selection based on sequence quality. These ESTs were trimmed and clustered using the computer programs Phred and CAP3 to form a unigene collection of 3126 unique sequences. Within clusters, 318 single nucleotide polymorphisms (SNPs) and 68 insertions–deletions (indels) were found, indicating the presence of paralogous gene families in our database. Each unigene sequence was analyzed for possible function using their similarity to known genes represented in the GenBank database and classified into 14 categories. Only 314 unigenes showed significant similarities to Phaseolus genomic sequences and P. vulgaris ESTs, which indicates that 90% (2818 unigenes) of our database represent newly discovered common bean genes. In addition, 12% (387 unigenes) were shown to be specific to common bean. This study represents a first step towards the discovery of novel genes in beans and a valuable source of molecular markers for expressed gene tagging and mapping.Key words: expressed sequence tag (EST), Colletotrichum lindemuthianum, Phaseolus vulgaris, simple sequence repeat (SSR), single nucleotide polymorphism (SNP).

Download Full-text

Characterization of expressed sequence tag-derived single-nucleotide polymorphisms in the bay scallop Argopecten irradians irradians

Fisheries Science ◽

10.1007/s12562-009-0167-0 ◽

2009 ◽

Vol 75 (6) ◽

pp. 1389-1400 ◽

Cited By ~ 6

Author(s):

Ronghua Li ◽

Qi Li ◽

Lingfeng Kong

Keyword(s):

Single Nucleotide Polymorphisms ◽

Expressed Sequence Tag ◽

Argopecten Irradians ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Bay Scallop ◽

Expressed Sequence

Download Full-text