scholarly journals Leveraging targeted sequencing for non-model species: a step-by-step guide to obtain a reduced SNP set and a pipeline to automate data processing in the Antarctic Midge, Belgica antarctica

2019 ◽  
Author(s):  
Vitor A. C. Pavinato ◽  
Saranga Wijeratne ◽  
Drew Spacht ◽  
David L. Denlinger ◽  
Tea Meulia ◽  
...  

AbstractThe sequencing of whole or partial (e.g. reduced representation) genomes are commonly employed in molecular ecology and conservation genetics studies. However, due to sequencing costs, a trade-off between the number of samples and genome coverage can hinder research for non-model organisms. Furthermore, the processing of raw sequences requires familiarity with coding and bioinformatic tools that are not always available. Here, we present a guide for isolating a set of short, SNP-containing genomic regions for use with targeted amplicon sequencing protocols. We also present a python pipeline--PypeAmplicon-- that facilitates processing of reads to individual genotypes. We demonstrate the applicability of our method by generating an informative set of amplicons for genotyping of the Antarctic midge, Belgica antarctica, an endemic dipteran species of the Antarctic Peninsula. Our pipeline analyzed raw sequences produced by a combination of high-multiplexed PCR and next-generation sequencing. A total of 38 out of 47 (81%) amplicons designed by our panel were recovered, allowing successful genotyping of 42 out of 55 (76%) targeted SNPs. The sequencing of ∼150 bp around the targeted SNPs also uncovered 80 new SNPs, which complemented our analyses. By comparing overall patterns of genetic diversity and population structure of amplicon data with the low-coverage, whole-genome re-sequencing (lcWGR) data used to isolate the informative amplicons, we were able to demonstrate that amplicon sequencing produces information and results similar to that of lcWGR. Our methods will benefit other research programs where rapid development of population genetic data is needed but yet prevented due to high expense and a lack of bioinformatic experience.

Author(s):  
Levente Laczkó ◽  
Sándor Jordán ◽  
Gábor Sramkó

Different versions of Restriction-site Associated DNA sequencing (RADseq) have become powerful and popular tools in molecular ecology. Although RADseq datasets are regarded as representative of the nuclear genome, reduced representation genomic libraries may also sample the organellar (mitochondrial and, in case of plants, plastid) DNA. Extraction of organellar loci from RADseq data can provide additional insights into the phylogenetics of the study group which comes at no additional sequencing effort. Cytoplasmic genetic variance can help better understand the evolutionary history by uncovering past hybridization and identifying the maternal (or, rarely, the paternal) lineage due to rapid lineage sorting. We developed a pipeline in bash that is based on existing bioinformatic tools to automatically mine and genotype organellar loci contained RADseq libraries. The utility of our pipeline is tested on eight, publicly available datasets spanning different phylogenetic levels (i.e. from family-level phylogenies to phylogeography) and RADseq methods (sdRAD, ddRAD, ezRAD, GBS) for genotyping both mitochondrial and plastid loci, which were subject to phylogenetic tree reconstruction. In all cases, organellar phylogenies adequately supplemented the original studies either by corroborating the large-scale picture based on RADseq or by bringing additional evidence on past or contemporary hybridization. RADseq methods designed to achieve a larger horizontal coverage (i.e. ddRAD, ezRAD) evidently yielded longer organellar alignments, but sdRAD and GBS still provided useful polymorphic loci found in the cytoplasmic DNA. Our newly developed pipeline for the above purpose can be run under a Unix-line operating system and is freely accessible at https://github.com/laczkol/RADOrgMiner


2020 ◽  
Author(s):  
Kazunari Matsudaira ◽  
Takafumi Ishida

AbstractGibbons in the genus Hylobates, which live in Southeast Asia, show great diversity, comprising seven to nine species. Natural hybridisation has been observed in the species contact zones, although the roles played by hybridisation and introgression in the evolution of these species remain unclear. To uncover the divergence history and the contributions of hybridisation and introgression to the evolution of Hylobates, random amplicon sequencing-direct (GRAS-Di) analysis was employed to genotype 47 gibbons, representing eight species from three genera. After quality filtering, over 300,000 autosomal single-nucleotide variant (SNV) sites were identified. The SNV-based autosomal phylogeny, together with the mitochondrial phylogeny, supported a divergence pattern beginning approximately 4.3 million years ago. First, the mainland species, H. pileatus and H. lar, consecutively diverged from the Sundaic island species. Second, H. moloch, in Java (and likely H. klossii, in the Mentawai Islands) diverged from the other species. Third, H. muelleri, in Borneo, and H. agilis/H. albibarbis, in Sumatra and southwestern Borneo, diverged. Lastly, H. agilis and H. albibarbis diverged from each other. The Patterson’s D-statistics indicated significant introgression between H. lar and H. pileatus, between H. lar and H. agilis, and between H. albibarbis and H. muelleri, and weak introgression was identified between H. moloch and H. albibarbis, and between H. moloch and H. muelleri abbotti, suggesting incomplete reproductive barriers among Hylobates species and that hybridisation and introgression occur whenever the distribution ranges contact. Some candidates for introgressed genomic regions were detected, and the functions of these would be revealed by further genome-wide studies.


2020 ◽  
Author(s):  
Steven M. Mussmann ◽  
Marlis R. Douglas ◽  
Tyler K. Chafin ◽  
Michael E. Douglas

AbstractBackgroundResearch on the molecular ecology of non-model organisms, while previously constrained, has now been greatly facilitated by the advent of reduced-representation sequencing protocols. However, tools that allow these large datasets to be efficiently parsed are often lacking, or if indeed available, then limited by the necessity of a comparable reference genome as an adjunct. This, of course, can be difficult when working with non-model organisms. Fortunately, pipelines are currently available that avoid this prerequisite, thus allowing data to be a priori parsed. An oft-used molecular ecology program (i.e., Structure), for example, is facilitated by such pipelines, yet they are surprisingly absent for a second program that is similarly popular and computationally more efficient (i.e., Admixture). The two programs differ in that Admixture employs a maximum-likelihood framework whereas Structure uses a Bayesian approach, yet both produce similar results. Given these issues, there is an overriding (and recognized) need among researchers in molecular ecology for bioinformatic software that will not only condense output from replicated Admixture runs, but also infer from these data the optimal number of population clusters (K).ResultsHere we provide such a program (i.e., AdmixPipe) that (a) filters SNPs to allow the delineation of population structure in Admixture, then (b) parses the output for summarization and graphical representation via Clumpak. Our benchmarks effectively demonstrate how efficient the pipeline is for processing large, non-model datasets generated via double digest restriction-site associated DNA sequencing (ddRAD). Outputs not only parallel those from Structure, but also visualize the variation among individual Admixture runs, so as to facilitate selection of the most appropriate K-value.ConclusionsAdmixPipe successfully integrates Admixture analysis with popular variant call format (VCF) filtering software to yield file types readily analyzed by Clumpak. Large population genomic datasets derived from non-model organisms are efficiently analyzed via the parallel-processing capabilities of Admixture. AdmixPipe is distributed under the GNU Public License and freely available for Mac OSX and Linux platforms at: https://github.com/stevemussmann/admixturePipeline.


Agronomy ◽  
2019 ◽  
Vol 9 (12) ◽  
pp. 881
Author(s):  
Miguel Loera-Sánchez ◽  
Bruno Studer ◽  
Roland Kölliker

Grasslands are wide-spread, multi-species ecosystems that provide many valuable services. Plant genetic diversity (i.e., the diversity within species) is closely linked to ecosystem functioning in grasslands and constitutes an important reservoir of genetic resources that can be used to breed improved cultivars of forage grass and legume species. Assessing genetic diversity in grassland plant species is demanding due to the large number of different species and the level of resolution needed. However, recent methodological advances could help in tackling this challenge at a larger scale. In this review, we outline the methods that can be used to measure genetic diversity in plants, highlighting their strengths and limitations for genetic diversity assessments of grassland plant species, with a special focus on forage plants. Such methods can be categorized into DNA fragment, hybridization array, and high-throughput sequencing (HTS) methods, and they differ in terms of resolution, throughput, and multiplexing potential. Special attention is given to HTS approaches (i.e., plastid genome skimming, whole genome re-sequencing, reduced representation libraries, sequence capture, and amplicon sequencing), because they enable unprecedented large-scale assessments of genetic diversity in non-model organisms with complex genomes, such as forage grasses and legumes. As no single method may be suited for all kinds of purposes, we also provide practical perspectives for genetic diversity analyses in forage breeding and genetic resource conservation efforts.


Author(s):  
Levente Laczkó ◽  
Sándor Jordán ◽  
Gábor Sramkó

Different versions of Restriction-site Associated DNA sequencing (RADseq) have become powerful and popular tools in molecular ecology. Although RADseq datasets are regarded as representative of the nuclear genome, reduced representation genomic libraries may also sample the organellar (mitochondrial and, in case of plants, plastid) DNA. Extraction of organellar loci from RADseq data can provide additional insights into the phylogenetics of the study group which comes at no additional sequencing effort. Cytoplasmic genetic variance can help better understand the evolutionary history by uncovering past hybridization and identifying the maternal (or, rarely, the paternal) lineage due to rapid lineage sorting. We developed a pipeline in bash that is based on existing bioinformatic tools to automatically mine and genotype organellar loci contained RADseq libraries. The utility of our pipeline is tested on eight, publicly available datasets spanning different phylogenetic levels (i.e. from family-level phylogenies to phylogeography) and RADseq methods (sdRAD, ddRAD, ezRAD, GBS) for genotyping both mitochondrial and plastid loci, which were subject to phylogenetic tree reconstruction. In all cases, organellar phylogenies adequately supplemented the original studies either by corroborating the large-scale picture based on RADseq or by bringing additional evidence on past or contemporary hybridization. RADseq methods designed to achieve a larger horizontal coverage (i.e. ddRAD, ezRAD) evidently yielded longer organellar alignments, but sdRAD and GBS still provided useful polymorphic loci found in the cytoplasmic DNA. Our newly developed pipeline for the above purpose can be run under a Unix-line operating system and is freely accessible at https://github.com/laczkol/RADOrgMiner


2019 ◽  
Vol 9 (10) ◽  
pp. 3467-3476 ◽  
Author(s):  
Paul M. Hime ◽  
Jeffrey T. Briggler ◽  
Joshua S. Reece ◽  
David W. Weisrock

Systems of genetic sex determination and the homology of sex chromosomes in different taxa vary greatly across vertebrates. Much progress remains to be made in understanding systems of genetic sex determination in non-model organisms, especially those with homomorphic sex chromosomes and/or large genomes. We used reduced representation genome sequencing to investigate genetic sex determination systems in the salamander family Cryptobranchidae (genera Cryptobranchus and Andrias), which typifies both of these inherent difficulties. We tested hypotheses of male- or female-heterogamety by sequencing hundreds of thousands of anonymous genomic regions in a panel of known-sex cryptobranchids and characterized patterns of presence/absence, inferred zygosity, and depth of coverage to identify sex-linked regions of these 56 gigabase genomes. Our results strongly support the hypothesis that all cryptobranchid species possess homologous systems of female heterogamety, despite maintenance of homomorphic sex chromosomes over nearly 60 million years. Additionally, we report a robust, non-invasive genetic assay for sex diagnosis in Cryptobranchus and Andrias which may have great utility for conservation efforts with these endangered salamanders. Co-amplification of these W-linked markers in both cryptobranchid genera provides evidence for long-term sex chromosome stasis in one of the most divergent salamander lineages. These findings inform hypotheses about the ancestral mode of sex determination in salamanders, but suggest that comparative data from other salamander families are needed. Our results further demonstrate that massive genomes are not necessarily a barrier to effective genome-wide sequencing and that the resulting data can be highly informative about sex determination systems in taxa with homomorphic sex chromosomes.


Author(s):  
Rajanikanth Govindarajulu ◽  
Ashley N Hostetler ◽  
Yuguo Xiao ◽  
Srinivasa R Chaluvadi ◽  
Margarita Mauro-Herrera ◽  
...  

Abstract Phenotypes such as branching, photoperiod sensitivity, and height were modified during plant domestication and crop improvement. Here, we perform quantitative trait locus (QTL) mapping of these and other agronomic traits in a recombinant inbred line (RIL) population derived from an interspecific cross between Sorghum propinquum and Sorghum bicolor inbred Tx7000. Using low-coverage Illumina sequencing and a bin-mapping approach, we generated ∼1920 bin markers spanning ∼875 cM. Phenotyping data were collected and analyzed from two field locations and one greenhouse experiment for six agronomic traits, thereby identifying a total of 30 QTL. Many of these QTL were penetrant across environments and co-mapped with major QTL identified in other studies. Other QTL uncovered new genomic regions associated with these traits, and some of these were environment-specific in their action. To further dissect the genetic underpinnings of tillering, we complemented QTL analysis with transcriptomics, identifying 6189 genes that were differentially expressed during tiller bud elongation. We identified genes such as Dormancy Associated Protein 1 (DRM1) in addition to various transcription factors that are differentially expressed in comparisons of dormant to elongating tiller buds and lie within tillering QTL, suggesting that these genes are key regulators of tiller elongation in sorghum. Our study demonstrates the usefulness of this RIL population in detecting domestication and improvement-associated genes in sorghum, thus providing a valuable resource for genetic investigation and improvement to the sorghum community.


Polar Biology ◽  
2014 ◽  
Vol 37 (8) ◽  
pp. 1213-1217 ◽  
Author(s):  
Eri Harada ◽  
Richard E. Lee ◽  
David L. Denlinger ◽  
Shin G. Goto

2020 ◽  
Author(s):  
Brendan N. Reid ◽  
Rachel L. Moran ◽  
Christopher J. Kopack ◽  
Sarah W. Fitzpatrick

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.


Sign in / Sign up

Export Citation Format

Share Document