Genome-wide SNP Data Reveal an Overestimation of Species Diversity in a Group of Hawkmoths

Anna K Hundsdoerfer; Kyung Min Lee; Ian J Kitching; Marko Mutanen

doi:10.1093/gbe/evz113

Genome-wide SNP Data Reveal an Overestimation of Species Diversity in a Group of Hawkmoths

Genome Biology and Evolution ◽

10.1093/gbe/evz113 ◽

2019 ◽

Vol 11 (8) ◽

pp. 2136-2150 ◽

Cited By ~ 6

Author(s):

Anna K Hundsdoerfer ◽

Kyung Min Lee ◽

Ian J Kitching ◽

Marko Mutanen

Keyword(s):

Valid Species ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Contemporary Debate ◽

Snp Data ◽

Genome Wide ◽

Genome Wide Data ◽

Research Outcome ◽

Biological Entities ◽

Hyles Euphorbiae

Abstract The interface between populations and evolving young species continues to generate much contemporary debate in systematics depending on the species concept(s) applied but which ultimately reduces to the fundamental question of “when do nondiscrete entities become distinct, mutually exclusive evolutionary units”? Species are perceived as critical biological entities, and the discovery and naming of new species is perceived by many authors as a major research aim for assessing current biodiversity before much of it becomes extinct. However, less attention is given to determining whether these names represent valid biological entities because this is perceived as both a laborious chore and an undesirable research outcome. The charismatic spurge hawkmoths (Hyles euphorbiae complex, HEC) offer an opportunity to study this less fashionable aspect of systematics. To elucidate this intriguing systematic challenge, we analyzed over 10,000 ddRAD single nucleotide polymorphisms from 62 individuals using coalescent-based and population genomic methodology. These genome-wide data reveal a clear overestimation of (sub)species-level diversity and demonstrate that the HEC taxonomy has been seriously oversplit. We conclude that only one valid species name should be retained for the entire HEC, namely Hyles euphorbiae, and we do not recognize any formal subspecies or other taxonomic subdivisions within it. Although the adoption of genetic tools has frequently revealed morphologically cryptic diversity, the converse, taxonomic oversplitting of species, is generally (and wrongly in our opinion) accepted as rare. Furthermore, taxonomic oversplitting is most likely to have taken place in intensively studied popular and charismatic organisms such as the HEC.

Download Full-text

Identification of genome-wide single-nucleotide polymorphisms among geographically diverse radish accessions

DNA Research ◽

10.1093/dnares/dsaa001 ◽

2020 ◽

Vol 27 (1) ◽

Author(s):

Hiroto Kobayashi ◽

Kenta Shirasawa ◽

Nobuko Fukino ◽

Hideki Hirakawa ◽

Takashi Akanuma ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Southeast Asia ◽

East Asia ◽

Near East ◽

Dna Polymorphisms ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

South And Southeast Asia ◽

Snp Data ◽

Genome Wide

Abstract Radish (Raphanus sativus L.) is cultivated around the world as a vegetable crop and exhibits diverse morphological and physiological features. DNA polymorphisms are responsible for differences in traits among cultivars. In this study, we determined genome-wide single-nucleotide polymorphisms (SNPs) among geographically diverse radish accessions using the double-digest restriction site-associated DNA sequencing (ddRAD-Seq) method. A total of 52,559 SNPs was identified in a collection of over 500 radish accessions (cultivated and wild) from East Asia, South and Southeast Asia, and the Occident and Near East. In addition, 2,624 SNP sites without missing data (referred to as common SNP sites) were identified among 510 accessions. Genetic diversity analyses, based on the common SNP sites, divided the cultivated radish accessions into four main groups, each derived from four geographical areas (Japan, East Asia, South and Southeast Asia, and the Occident and Near East). Furthermore, we discuss the origin of cultivated radish and its migration from the West to East Asia. SNP data generated in this work will facilitate further genetic studies on the radish breeding and production of DNA markers.

Download Full-text

Candidate-gene association study searching for genetic factors involved in migraine chronification

Cephalalgia ◽

10.1177/0333102414547141 ◽

2014 ◽

Vol 35 (6) ◽

pp. 500-507 ◽

Cited By ~ 9

Author(s):

MA Louter ◽

J Fernandez-Morales ◽

B de Vries ◽

B Winsvold ◽

V Anttila ◽

...

Keyword(s):

Association Study ◽

Genetic Factors ◽

Data Sets ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Candidate Gene Association ◽

Genome Wide ◽

Third Stage ◽

Genome Wide Data ◽

Two Stages

Introduction Chronic migraine (CM) is at the severe end of the clinical migraine spectrum, but its genetic background is unknown. Our study searched for evidence that genetic factors are involved in the chronification process. Methods We initially selected 144 single-nucleotide polymorphisms (SNPs) from 48 candidate genes, which we tested for association in two stages: The first stage encompassed 262 CM patients, the second investigated 226 patients with high-frequency migraine (HFM). Subsequently, SNPs with p values < 0.05 were forwarded to the replication stage containing 531 patients with CM or HFM. Results Eight SNPs were significantly associated with CM and HFM in the two-stage phase. None survived replication in the third stage. Discussion We present the first comprehensive genetic association study for migraine chronification. There were no significant findings. Future studies may benefit from larger, genome-wide data sets or should use other genetic approaches to identify genetic factors involved in migraine chronification.

Download Full-text

Fast Principal Component Analysis of Large-Scale Genome-Wide Data

10.1101/002238 ◽

2014 ◽

Cited By ~ 2

Author(s):

Gad Abraham ◽

Michael Inouye

Keyword(s):

Principal Component Analysis ◽

Large Scale ◽

Principal Component ◽

Component Analysis ◽

Single Nucleotide ◽

Snp Data ◽

Genome Wide ◽

Genome Wide Data ◽

Eigen Decomposition ◽

Traditional Approaches

Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.

Download Full-text

Prediction of Breast Cancer Treatment–Induced Fatigue by Machine Learning Using Genome-Wide Association Data

JNCI Cancer Spectrum ◽

10.1093/jncics/pkaa039 ◽

2020 ◽

Vol 4 (5) ◽

Author(s):

Sangkyu Lee ◽

Joseph O Deasy ◽

Jung Hun Oh ◽

Antonio Di Meglio ◽

Agnes Dumas ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Area Under The Curve ◽

Breast Cancer Treatment ◽

Nucleotide Polymorphisms ◽

Cognitive Fatigue ◽

Single Nucleotide ◽

False Discovery ◽

Genome Wide ◽

Genome Wide Data

Abstract Background We aimed at predicting fatigue after breast cancer treatment using machine learning on clinical covariates and germline genome-wide data. Methods We accessed germline genome-wide data of 2799 early-stage breast cancer patients from the Cancer Toxicity study (NCT01993498). The primary endpoint was defined as scoring zero at diagnosis and higher than quartile 3 at 1 year after primary treatment completion on European Organization for Research and Treatment of Cancer quality-of-life questionnaires for Overall Fatigue and on the multidimensional questionnaire for Physical, Emotional, and Cognitive fatigue. First, we tested univariate associations of each endpoint with clinical variables and genome-wide variants. Then, using preselected clinical (false discovery rate < 0.05) and genomic (P < .001) variables, a multivariable preconditioned random-forest regression model was built and validated on a hold-out subset to predict fatigue. Gene set enrichment analysis identified key biological correlates (MetaCore). All statistical tests were 2-sided. Results Statistically significant clinical associations were found only with Emotional and Cognitive Fatigue, including receipt of chemotherapy, anxiety, and pain. Some single nucleotide polymorphisms had some degree of association (P < .001) with the different fatigue endpoints, although there were no genome-wide statistically significant (P < 5.00 × 10−8) associations. Only for Cognitive Fatigue, the predictive ability of the genomic multivariable model was statistically significantly better than random (area under the curve = 0.59, P = .01) and marginally improved with clinical variables (area under the curve = 0.60, P = .005). Single nucleotide polymorphisms found to be associated (P < .001) with Cognitive Fatigue belonged to genes linked to inflammation (false discovery rate adjusted P = .03), cognitive disorders (P = 1.51 × 10−12), and synaptic transmission (P = 6.28 × 10−8). Conclusions Genomic analyses in this large cohort of breast cancer survivors suggest a possible genetic role for severe Cognitive Fatigue that warrants further exploration.

Download Full-text

Phylogeography of Lyme borreliosis-group spirochetes and methicillin-resistantStaphylococcus aureus

Parasitology ◽

10.1017/s0031182012000741 ◽

2012 ◽

Vol 139 (14) ◽

pp. 1952-1965 ◽

Cited By ~ 11

Author(s):

GABRIELE MARGOS ◽

SANTIAGO CASTILLO-RAMÍREZ ◽

ANNE GATEWOOD HOEN

Keyword(s):

Lyme Borreliosis ◽

Housekeeping Genes ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Bacterial Typing ◽

Infected People ◽

Snp Data ◽

Genome Wide ◽

Global Epidemiology ◽

Zoonotic Bacteria

SUMMARYMultilocus sequence typing (MLST) and multilocus sequence analysis (MLSA) have revolutionized understanding the global epidemiology of many medically relevant bacteria utilizing a number, mostly seven, of housekeeping genes. A more recent introduction, single nucleotide polymorphisms (SNPs), constitutes an even more powerful tool for bacterial typing, population genetic studies and phylogeography. The introduction of massive parallel sequencing has made genome re-sequencing and SNP discovery more economical for investigations of microbial organisms. In this paper we review phylogeographic studies on Lyme borreliosis (LB)-group spirochetes and methicillin-resistantStaphylococcus aureus(MRSA). Members of the LB-group spirochetes are tick-transmitted zoonotic bacteria that have many hosts and differ in their degree of host specialism, constituting a highly complex system. MRSA is a directly transmitted pathogen that may be acquired by contact with infected people, animals or MRSA-contaminated objects. For the LB-group spirochetes, MLSA has proved a powerful tool for species assignment and phylogeographic investigations while forS. aureus, genome-wide SNP data have been used to study the very short-term evolution of two important MRSA lineages, ST239 and ST225. These data are detailed in this review.

Download Full-text

EpiPen: An R Package to Investigate Two-Locus Epistatic Models

Twin Research and Human Genetics ◽

10.1017/thg.2014.25 ◽

2014 ◽

Vol 17 (4) ◽

Cited By ~ 2

Author(s):

Raymond K. Walters ◽

Charles Laurin ◽

Gitta H. Lubke

Keyword(s):

Power Analysis ◽

R Package ◽

Simulation Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Epistatic Interactions ◽

Model Interpretation ◽

Genome Wide ◽

Using Data ◽

Power Analyses

Epistasis is a growing area of research in genome-wide studies, but the differences between alternative definitions of epistasis remain a source of confusion for many researchers. One problem is that models for epistasis are presented in a number of formats, some of which have difficult-to-interpret parameters. In addition, the relation between the different models is rarely explained. Existing software for testing epistatic interactions between single-nucleotide polymorphisms (SNPs) does not provide the flexibility to compare the available model parameterizations. For that reason we have developed an R package for investigating epistatic and penetrance models, EpiPen, to aid users who wish to easily compare, interpret, and utilize models for two-locus epistatic interactions. EpiPen facilitates research on SNP-SNP interactions by allowing the R user to easily convert between common parametric forms for two-locus interactions, generate data for simulation studies, and perform power analyses for the selected model with a continuous or dichotomous phenotype. The usefulness of the package for model interpretation and power analysis is illustrated using data on rheumatoid arthritis.

Download Full-text

Genome-wide SNPs redefines species boundaries and conservation units in the freshwater mussel genus Cyprogenia of North America

Scientific Reports ◽

10.1038/s41598-021-90325-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kyung Seok Kim ◽

Kevin J. Roe

Keyword(s):

Phylogenetic Analyses ◽

Freshwater Mussel ◽

Conservation Strategies ◽

Nucleotide Polymorphisms ◽

Conservation Units ◽

Genetic Structuring ◽

The North ◽

Snp Data ◽

Genome Wide ◽

Significant Difference

AbstractDetailed information on species delineation and population genetic structure is a prerequisite for designing effective restoration and conservation strategies for imperiled organisms. Phylogenomic and population genomic analyses based on genome-wide double digest restriction-site associated DNA sequencing (ddRAD-Seq) data has identified three allopatric lineages in the North American freshwater mussel genus Cyprogenia. Cyprogenia stegaria is restricted to the Eastern Highlands and displays little genetic structuring within this region. However, two allopatric lineages of C. aberti in the Ozark and Ouachita highlands exhibit substantial levels (mean uncorrected FST = 0.368) of genetic differentiation and each warrants recognition as a distinct evolutionary lineage. Lineages of Cyprogenia in the Ouachita and Ozark highlands are further subdivided reflecting structuring at the level of river systems. Species tree inference and species delimitation in a Bayesian framework using single nucleotide polymorphisms (SNP) data supported results from phylogenetic analyses, and supports three species of Cyprogenia over the currently recognized two species. A comparison of SNPs generated from both destructively and non-destructively collected samples revealed no significant difference in the SNP error rate, quality and amount of ddRAD sequence reads, indicating that nondestructive or trace samples can be effectively utilized to generate SNP data for organisms for which destructive sampling is not permitted.

Download Full-text

Genomic Analyses of Globodera pallida, A Quarantine Agricultural Pathogen in Idaho

Pathogens ◽

10.3390/pathogens10030363 ◽

2021 ◽

Vol 10 (3) ◽

pp. 363

Author(s):

Sulochana K. Wasala ◽

Dana K. Howe ◽

Louise-Marie Dandurand ◽

Inga A. Zasada ◽

Dee R. Denver

Keyword(s):

Genetic Variation ◽

Potato Production ◽

Globodera Pallida ◽

Fixation Index ◽

Parasitic Nematodes ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Field Samples ◽

Multiple Introduction

Globodera pallida is among the most significant plant-parasitic nematodes worldwide, causing major damage to potato production. Since it was discovered in Idaho in 2006, eradication efforts have aimed to contain and eradicate G. pallida through phytosanitary action and soil fumigation. In this study, we investigated genome-wide patterns of G. pallida genetic variation across Idaho fields to evaluate whether the infestation resulted from a single or multiple introduction(s) and to investigate potential evolutionary responses since the time of infestation. A total of 53 G. pallida samples (~1,042,000 individuals) were collected and analyzed, representing five different fields in Idaho, a greenhouse population, and a field in Scotland that was used for external comparison. According to genome-wide allele frequency and fixation index (Fst) analyses, most of the genetic variation was shared among the G. pallida populations in Idaho fields pre-fumigation, indicating that the infestation likely resulted from a single introduction. Temporal patterns of genome-wide polymorphisms involving (1) pre-fumigation field samples collected in 2007 and 2014 and (2) pre- and post-fumigation samples revealed nucleotide variants (SNPs, single-nucleotide polymorphisms) with significantly differentiated allele frequencies indicating genetic differentiation. This study provides insights into the genetic origins and adaptive potential of G. pallida invading new environments.

Download Full-text

Genetic dissection of soybean partial resistance to sclerotinia stem rot through genome wide association study and high throughout single nucleotide polymorphisms

Genomics ◽

10.1016/j.ygeno.2020.10.042 ◽

2021 ◽

Author(s):

Yan Jing ◽

Weili Teng ◽

Lijuan Qiu ◽

Hongkun Zheng ◽

Wenbin Li ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Partial Resistance ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Stem Rot ◽

Genetic Dissection ◽

Nucleotide Polymorphisms ◽

Sclerotinia Stem Rot ◽

Single Nucleotide ◽

Genome Wide

Download Full-text

Single Nucleotide Polymorphism Discovery and Genetic Differentiation Analysis of Geese Bred in Poland, Using Genotyping-by-Sequencing (GBS)

Genes ◽

10.3390/genes12071074 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1074

Author(s):

Joanna Grzegorczyk ◽

Artur Gurgul ◽

Maria Oczkowicz ◽

Tomasz Szmatoła ◽

Agnieszka Fornal ◽

...

Keyword(s):

Genotyping By Sequencing ◽

Read Depth ◽

Model Organisms ◽

Single Nucleotide Polymorphism Discovery ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Polymorphism Discovery ◽

Genome Wide ◽

Plumage Development ◽

Edar Gene

Poland is the largest European producer of goose, while goose breeding has become an essential and still increasing branch of the poultry industry. The most frequently bred goose is the White Kołuda® breed, constituting 95% of the country’s population, whereas geese of regional varieties are bred in smaller, conservation flocks. However, a goose’s genetic diversity is inaccurately explored, mainly because the advantages of the most commonly used tools are strongly limited in non-model organisms. One of the most accurate used markers for population genetics is single nucleotide polymorphisms (SNP). A highly efficient strategy for genome-wide SNP detection is genotyping-by-sequencing (GBS), which has been already widely applied in many organisms. This study attempts to use GBS in 12 conservative goose breeds and the White Kołuda® breed maintained in Poland. The GBS method allowed for the detection of 3833 common raw SNPs. Nevertheless, after filtering for read depth and alleles characters, we obtained the final markers panel used for a differentiation analysis that comprised 791 SNPs. These variants were located within 11 different genes, and one of the most diversified variants was associated with the EDAR gene, which is especially interesting as it participates in the plumage development, which plays a crucial role in goose breeding.

Download Full-text