Comparative Analysis of SNP Discovery and Genotyping in Fagus sylvatica L. and Quercus robur L. Using RADseq, GBS, and ddRAD Methods

Bartosz Ulaszewski; Joanna Meger; Jaroslaw Burczyk

doi:10.3390/f12020222

Comparative Analysis of SNP Discovery and Genotyping in Fagus sylvatica L. and Quercus robur L. Using RADseq, GBS, and ddRAD Methods

Forests ◽

10.3390/f12020222 ◽

2021 ◽

Vol 12 (2) ◽

pp. 222

Author(s):

Bartosz Ulaszewski ◽

Joanna Meger ◽

Jaroslaw Burczyk

Keyword(s):

Population Genomics ◽

De Novo ◽

Genetic Studies ◽

Genomic Libraries ◽

Reduced Representation ◽

Large Numbers ◽

Broadleaved Tree Species ◽

Fagus Sylvatica L ◽

Reference Genomes ◽

Future Population

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.

Download Full-text

Shed skin as a source of DNA for genotyping-by-sequencing (GBS) in reptiles

10.1101/658989 ◽

2019 ◽

Author(s):

Thomas D Brekke ◽

Liam Shier ◽

Matthew J Hegarty ◽

John F Mulley

Keyword(s):

Genotyping By Sequencing ◽

Direct Result ◽

Good Source ◽

Genetic Studies ◽

Reduced Representation ◽

The Public ◽

Large Numbers ◽

Corn Snake ◽

Skin Samples ◽

Transitions And Transversions

AbstractAssociation and genetic mapping studies aimed at linking genotype to phenotype are powerful tools that require large numbers of samples, complicating their use in long-lived species with low fecundity. Shed skins of snakes and other reptiles contain DNA; are a safe and ethical way of non-invasively sampling large numbers of individuals; and provide a simple mechanism by which to involve the public in scientific research. Here we test whether the DNA in dried shed skins mailed to us from citizen scientists is suitable for reduced representation sequencing approaches, specifically genotyping-by-sequencing (GBS). We find that shed skin samples provide DNA of sufficient quality and quantity for GBS, although libraries from shed skin resulted in fewer sequenced reads than libraries from snap-frozen muscle, and contained slightly fewer variants (70,685 SNPs versus 97,724). This issue is a direct result of lower read counts of the shed skin samples, and can be rectified quite simply with deeper sequencing. Skin-derived libraries also have a very slight (but significantly different) profile of transitions and transversions, suggesting that DNA damage occurs but is minimal. We conclude that shed skin-derived DNA is a good source of genomic DNA for a variety of genetic studies, and use it to identify sex-linked scaffolds in the corn snake genome.

Download Full-text

From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species

BMC Genomics ◽

10.1186/s12864-019-5806-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 11

Author(s):

Belinda Wright ◽

Katherine A. Farquharson ◽

Elspeth A. McLennan ◽

Katherine Belov ◽

Carolyn J. Hogg ◽

...

Keyword(s):

Population Genomics ◽

Reduced Representation ◽

Wildlife Species ◽

Reference Genomes ◽

Reduced Representation Sequencing

Download Full-text

Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

BioMed Research International ◽

10.1155/2014/675158 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 46

Author(s):

Daniel C. Ilut ◽

Marie L. Nydam ◽

Matthew P. Hare

Keyword(s):

Population Genomics ◽

De Novo ◽

Genomic Data ◽

Null Alleles ◽

Great Promise ◽

Sequence Difference ◽

Reduced Representation ◽

Ciona Savignyi ◽

Optimum Threshold ◽

Generation Sequencing

Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused onCiona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empiricalCiona savignyidata also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data.

Download Full-text

Genome-wide methylation sequencing identifies progression-related epigenetic drivers in myelodysplastic syndromes

Cell Death and Disease ◽

10.1038/s41419-020-03213-2 ◽

2020 ◽

Vol 11 (11) ◽

Author(s):

Jing-dong Zhou ◽

Ting-juan Zhang ◽

Zi-jun Xu ◽

Zhao-qun Deng ◽

Yu Gu ◽

...

Keyword(s):

Cancer Progression ◽

Myelodysplastic Syndromes ◽

Bisulfite Sequencing ◽

De Novo ◽

Dna Hypermethylation ◽

Reduced Representation ◽

Targeted Bisulfite Sequencing ◽

Specific Pcr ◽

Genome Wide ◽

Potential Biomarker

AbstractThe potential mechanism of myelodysplastic syndromes (MDS) progressing to acute myeloid leukemia (AML) remains poorly elucidated. It has been proved that epigenetic alterations play crucial roles in the pathogenesis of cancer progression including MDS. However, fewer studies explored the whole-genome methylation alterations during MDS progression. Reduced representation bisulfite sequencing was conducted in four paired MDS/secondary AML (MDS/sAML) patients and intended to explore the underlying methylation-associated epigenetic drivers in MDS progression. In four paired MDS/sAML patients, cases at sAML stage exhibited significantly increased methylation level as compared with the matched MDS stage. A total of 1090 differentially methylated fragments (DMFs) (441 hypermethylated and 649 hypomethylated) were identified involving in MDS pathogenesis, whereas 103 DMFs (96 hypermethylated and 7 hypomethylated) were involved in MDS progression. Targeted bisulfite sequencing further identified that aberrant GFRA1, IRX1, NPY, and ZNF300 methylation were frequent events in an additional group of de novo MDS and AML patients, of which only ZNF300 methylation was associated with ZNF300 expression. Subsequently, ZNF300 hypermethylation in larger cohorts of de novo MDS and AML patients was confirmed by real-time quantitative methylation-specific PCR. It was illustrated that ZNF300 methylation could act as a potential biomarker for the diagnosis and prognosis in MDS and AML patients. Functional experiments demonstrated the anti-proliferative and pro-apoptotic role of ZNF300 overexpression in MDS-derived AML cell-line SKM-1. Collectively, genome-wide DNA hypermethylation were frequent events during MDS progression. Among these changes, ZNF300 methylation, a regulator of ZNF300 expression, acted as an epigenetic driver in MDS progression. These findings provided a theoretical basis for the usage of demethylation drugs in MDS patients against disease progression.

Download Full-text

A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes

BMC Genomics ◽

10.1186/1471-2164-15-1138 ◽

2014 ◽

Vol 15 (1) ◽

pp. 1138 ◽

Cited By ~ 112

Author(s):

Holly B Bratcher ◽

Craig Corton ◽

Keith A Jolley ◽

Julian Parkhill ◽

Martin CJ Maiden

Keyword(s):

Neisseria Meningitidis ◽

De Novo Assembly ◽

Population Genomics ◽

De Novo ◽

Genealogical Analysis

Download Full-text

AStrap: identification of alternative splicing from transcript sequences without a reference genome

Bioinformatics ◽

10.1093/bioinformatics/bty1008 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2654-2656 ◽

Cited By ~ 5

Author(s):

Guoli Ji ◽

Wenbin Ye ◽

Yaru Su ◽

Moliang Chen ◽

Guangzao Huang ◽

...

Keyword(s):

Machine Learning ◽

Alternative Splicing ◽

Single Molecule ◽

Reference Genome ◽

De Novo ◽

Supplementary Information ◽

Model Organisms ◽

Sequencing Data ◽

Extensive Evaluation ◽

Reference Genomes

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Systematic Review on Extreme Phenotype Strategies to Search for Rare Variants in Genetic Studies of Complex Disorders

10.20944/preprints202007.0583.v1 ◽

2020 ◽

Author(s):

Sana Amanat ◽

Teresa Requena ◽

Jose Antonio Lopez-Escamez

Keyword(s):

Systematic Review ◽

Candidate Genes ◽

Rare Variants ◽

De Novo ◽

Complex Diseases ◽

De Novo Mutations ◽

Genetic Studies ◽

Complex Disorders ◽

Extreme Phenotype ◽

Age Related

Exome sequencing has been commonly used in rare diseases by selecting multiplex families or singletons with an extreme phenotype (EP) to search for rare variants in coding regions. The EP strategy covers both extreme ends of a disease spectrum and it has been also used to investigate the contribution of rare variants to heritability in complex clinical traits. We have conducted a systematic review to find evidence supporting the use of EP strategies to search for rare variants in genetic studies of complex diseases, to highlight the contribution of rare variation to the genetic structure of multiallelic conditions. After performing the quality assessment of the retrieved records, we selected 19 genetic studies considering EP to demonstrate genetic association. All the studies successfully identified several rare variants, de novo mutations and many novel candidate genes were also identified by selecting an EP. There is enough evidence to support that the EP approach in patients with an early onset of the disease can contribute to the identification of rare variants in candidate genes or pathways involved in complex diseases. EP patients may contribute to a better understanding of the underlying genetic architecture of common heterogeneous disorders such as tinnitus or age-related hearing loss.

Download Full-text

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

10.7287/peerj.preprints.314 ◽

2014 ◽

Author(s):

Jonathan Puritz ◽

Christopher M. Hollenbeck ◽

John R. Gold

Keyword(s):

Population Genomics ◽

De Novo ◽

Variant Calling ◽

Population Level ◽

Model Organisms ◽

Effective Population ◽

Reduction Techniques ◽

Indel Polymorphisms ◽

Indel Calling ◽

Population Sizes

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

Download Full-text

Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato

10.1101/000208 ◽

2013 ◽

Author(s):

Nicola Nadeau ◽

Mayte Ruiz ◽

Patricio Salazar ◽

Brian Counterman ◽

Jose Alejandro Medina ◽

...

Keyword(s):

Population Genomics ◽

De Novo ◽

Reference Sequence ◽

Colour Pattern ◽

Adaptive Divergence ◽

Population Divergence ◽

Hybrid Zones ◽

Data Alignment ◽

Parallel Hybrid ◽

Genomic Regions

Hybrid zones can be valuable tools for studying evolution and identifying genomic regions responsible for adaptive divergence and underlying phenotypic variation. Hybrid zones between subspecies of Heliconius butterflies can be very narrow and are maintained by strong selection acting on colour pattern. The co-mimetic species H. erato and H. melpomene have parallel hybrid zones where both species undergo a change from one colour pattern form to another. We use restriction associated DNA sequencing to obtain several thousand genome wide sequence markers and use these to analyse patterns of population divergence across two pairs of parallel hybrid zones in Peru and Ecuador. We compare two approaches for analysis of this type of data; alignment to a reference genome and de novo assembly, and find that alignment gives the best results for species both closely (H. melpomene) and distantly (H. erato, ~15% divergent) related to the reference sequence. Our results confirm that the colour pattern controlling loci account for the majority of divergent regions across the genome, but we also detect other divergent regions apparently unlinked to colour pattern differences. We also use association mapping to identify previously unmapped colour pattern loci, in particular the Ro locus. Finally, we identify within our sample a new cryptic population of H. timareta in Ecuador, which occurs at relatively low altitude and is mimetic with H. melpomene malleti.

Download Full-text

Stacks 2: Analytical Methods for Paired-end Sequencing Improve RADseq-based Population Genomics

10.1101/615385 ◽

2019 ◽

Cited By ~ 11

Author(s):

Nicolas C. Rochette ◽

Angel G. Rivera-Colón ◽

Julian M. Catchen

Keyword(s):

Population Genomics ◽

De Novo ◽

Simulated Data ◽

Natural World ◽

Massively Parallel ◽

Genetic Knowledge ◽

Type Ii ◽

Short Read Sequencing ◽

The Family ◽

Paired End Sequencing

AbstractFor half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads and overlap those contigs with the corresponding single-end loci. The new architecture allows all the individuals in a meta population to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes – generating RAD loci that are 400-800bp in length. To prove its recall and precision, we test the software with simulated data and compare reference-aligned and de novo analyses of three empirical datasets. We show that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo datasets.

Download Full-text