Leveraging targeted sequencing for non-model species: a step-by-step guide to obtain a reduced SNP set and a pipeline to automate data processing in the Antarctic Midge, Belgica antarctica

A magnet to draw a bright needle out from the haystack -- RADOrgMiner, an automated pipeline to genotype organellar reads from RADseq data

10.22541/au.163255933.36686159/v1 ◽

2021 ◽

Author(s):

Levente Laczkó ◽

Sándor Jordán ◽

Gábor Sramkó

Keyword(s):

Large Scale ◽

Molecular Ecology ◽

Nuclear Genome ◽

Plastid Dna ◽

Lineage Sorting ◽

Genomic Libraries ◽

Reduced Representation ◽

Bioinformatic Tools ◽

Cytoplasmic Dna ◽

Automated Pipeline

Different versions of Restriction-site Associated DNA sequencing (RADseq) have become powerful and popular tools in molecular ecology. Although RADseq datasets are regarded as representative of the nuclear genome, reduced representation genomic libraries may also sample the organellar (mitochondrial and, in case of plants, plastid) DNA. Extraction of organellar loci from RADseq data can provide additional insights into the phylogenetics of the study group which comes at no additional sequencing effort. Cytoplasmic genetic variance can help better understand the evolutionary history by uncovering past hybridization and identifying the maternal (or, rarely, the paternal) lineage due to rapid lineage sorting. We developed a pipeline in bash that is based on existing bioinformatic tools to automatically mine and genotype organellar loci contained RADseq libraries. The utility of our pipeline is tested on eight, publicly available datasets spanning different phylogenetic levels (i.e. from family-level phylogenies to phylogeography) and RADseq methods (sdRAD, ddRAD, ezRAD, GBS) for genotyping both mitochondrial and plastid loci, which were subject to phylogenetic tree reconstruction. In all cases, organellar phylogenies adequately supplemented the original studies either by corroborating the large-scale picture based on RADseq or by bringing additional evidence on past or contemporary hybridization. RADseq methods designed to achieve a larger horizontal coverage (i.e. ddRAD, ezRAD) evidently yielded longer organellar alignments, but sdRAD and GBS still provided useful polymorphic loci found in the cytoplasmic DNA. Our newly developed pipeline for the above purpose can be run under a Unix-line operating system and is freely accessible at https://github.com/laczkol/RADOrgMiner

Download Full-text

Divergence and introgression in small apes, the genus Hylobates, revealed by reduced representation sequencing

10.1101/2020.05.31.126078 ◽

2020 ◽

Author(s):

Kazunari Matsudaira ◽

Takafumi Ishida

Keyword(s):

Amplicon Sequencing ◽

Single Nucleotide ◽

Reduced Representation ◽

Island Species ◽

Genome Wide ◽

Distribution Ranges ◽

Natural Hybridisation ◽

Quality Filtering ◽

Divergence Pattern ◽

Genomic Regions

AbstractGibbons in the genus Hylobates, which live in Southeast Asia, show great diversity, comprising seven to nine species. Natural hybridisation has been observed in the species contact zones, although the roles played by hybridisation and introgression in the evolution of these species remain unclear. To uncover the divergence history and the contributions of hybridisation and introgression to the evolution of Hylobates, random amplicon sequencing-direct (GRAS-Di) analysis was employed to genotype 47 gibbons, representing eight species from three genera. After quality filtering, over 300,000 autosomal single-nucleotide variant (SNV) sites were identified. The SNV-based autosomal phylogeny, together with the mitochondrial phylogeny, supported a divergence pattern beginning approximately 4.3 million years ago. First, the mainland species, H. pileatus and H. lar, consecutively diverged from the Sundaic island species. Second, H. moloch, in Java (and likely H. klossii, in the Mentawai Islands) diverged from the other species. Third, H. muelleri, in Borneo, and H. agilis/H. albibarbis, in Sumatra and southwestern Borneo, diverged. Lastly, H. agilis and H. albibarbis diverged from each other. The Patterson’s D-statistics indicated significant introgression between H. lar and H. pileatus, between H. lar and H. agilis, and between H. albibarbis and H. muelleri, and weak introgression was identified between H. moloch and H. albibarbis, and between H. moloch and H. muelleri abbotti, suggesting incomplete reproductive barriers among Hylobates species and that hybridisation and introgression occur whenever the distribution ranges contact. Some candidates for introgressed genomic regions were detected, and the functions of these would be revealed by further genome-wide studies.

Download Full-text

AdmixPipe: Population analyses in Admixture for non-model organisms

10.1101/2020.07.06.190389 ◽

2020 ◽

Author(s):

Steven M. Mussmann ◽

Marlis R. Douglas ◽

Tyler K. Chafin ◽

Michael E. Douglas

Keyword(s):

Graphical Representation ◽

A Priori ◽

Large Population ◽

Molecular Ecology ◽

Optimal Number ◽

Model Organisms ◽

Variant Call ◽

Reduced Representation ◽

Ecology Program ◽

Selection Of

AbstractBackgroundResearch on the molecular ecology of non-model organisms, while previously constrained, has now been greatly facilitated by the advent of reduced-representation sequencing protocols. However, tools that allow these large datasets to be efficiently parsed are often lacking, or if indeed available, then limited by the necessity of a comparable reference genome as an adjunct. This, of course, can be difficult when working with non-model organisms. Fortunately, pipelines are currently available that avoid this prerequisite, thus allowing data to be a priori parsed. An oft-used molecular ecology program (i.e., Structure), for example, is facilitated by such pipelines, yet they are surprisingly absent for a second program that is similarly popular and computationally more efficient (i.e., Admixture). The two programs differ in that Admixture employs a maximum-likelihood framework whereas Structure uses a Bayesian approach, yet both produce similar results. Given these issues, there is an overriding (and recognized) need among researchers in molecular ecology for bioinformatic software that will not only condense output from replicated Admixture runs, but also infer from these data the optimal number of population clusters (K).ResultsHere we provide such a program (i.e., AdmixPipe) that (a) filters SNPs to allow the delineation of population structure in Admixture, then (b) parses the output for summarization and graphical representation via Clumpak. Our benchmarks effectively demonstrate how efficient the pipeline is for processing large, non-model datasets generated via double digest restriction-site associated DNA sequencing (ddRAD). Outputs not only parallel those from Structure, but also visualize the variation among individual Admixture runs, so as to facilitate selection of the most appropriate K-value.ConclusionsAdmixPipe successfully integrates Admixture analysis with popular variant call format (VCF) filtering software to yield file types readily analyzed by Clumpak. Large population genomic datasets derived from non-model organisms are efficiently analyzed via the parallel-processing capabilities of Admixture. AdmixPipe is distributed under the GNU Public License and freely available for Mac OSX and Linux platforms at: https://github.com/stevemussmann/admixturePipeline.

Download Full-text

DNA-Based Assessment of Genetic Diversity in Grassland Plant Species: Challenges, Approaches, and Applications

Agronomy ◽

10.3390/agronomy9120881 ◽

2019 ◽

Vol 9 (12) ◽

pp. 881

Author(s):

Miguel Loera-Sánchez ◽

Bruno Studer ◽

Roland Kölliker

Keyword(s):

Genetic Diversity ◽

Plant Species ◽

Large Scale ◽

High Throughput Sequencing ◽

Amplicon Sequencing ◽

Model Organisms ◽

Special Focus ◽

Reduced Representation ◽

Forage Plants ◽

Single Method

Grasslands are wide-spread, multi-species ecosystems that provide many valuable services. Plant genetic diversity (i.e., the diversity within species) is closely linked to ecosystem functioning in grasslands and constitutes an important reservoir of genetic resources that can be used to breed improved cultivars of forage grass and legume species. Assessing genetic diversity in grassland plant species is demanding due to the large number of different species and the level of resolution needed. However, recent methodological advances could help in tackling this challenge at a larger scale. In this review, we outline the methods that can be used to measure genetic diversity in plants, highlighting their strengths and limitations for genetic diversity assessments of grassland plant species, with a special focus on forage plants. Such methods can be categorized into DNA fragment, hybridization array, and high-throughput sequencing (HTS) methods, and they differ in terms of resolution, throughput, and multiplexing potential. Special attention is given to HTS approaches (i.e., plastid genome skimming, whole genome re-sequencing, reduced representation libraries, sequence capture, and amplicon sequencing), because they enable unprecedented large-scale assessments of genetic diversity in non-model organisms with complex genomes, such as forage grasses and legumes. As no single method may be suited for all kinds of purposes, we also provide practical perspectives for genetic diversity analyses in forage breeding and genetic resource conservation efforts.

Download Full-text

A magnet to draw a bright needle out from the haystack -- RADOrgMiner, an automated pipeline to genotype organellar reads from RADseq data

10.22541/au.163255933.36686159/v2 ◽

2021 ◽

Author(s):

Levente Laczkó ◽

Sándor Jordán ◽

Gábor Sramkó

Keyword(s):

Large Scale ◽

Molecular Ecology ◽

Nuclear Genome ◽

Plastid Dna ◽

Lineage Sorting ◽

Genomic Libraries ◽

Reduced Representation ◽

Bioinformatic Tools ◽

Cytoplasmic Dna ◽

Automated Pipeline

Different versions of Restriction-site Associated DNA sequencing (RADseq) have become powerful and popular tools in molecular ecology. Although RADseq datasets are regarded as representative of the nuclear genome, reduced representation genomic libraries may also sample the organellar (mitochondrial and, in case of plants, plastid) DNA. Extraction of organellar loci from RADseq data can provide additional insights into the phylogenetics of the study group which comes at no additional sequencing effort. Cytoplasmic genetic variance can help better understand the evolutionary history by uncovering past hybridization and identifying the maternal (or, rarely, the paternal) lineage due to rapid lineage sorting. We developed a pipeline in bash that is based on existing bioinformatic tools to automatically mine and genotype organellar loci contained RADseq libraries. The utility of our pipeline is tested on eight, publicly available datasets spanning different phylogenetic levels (i.e. from family-level phylogenies to phylogeography) and RADseq methods (sdRAD, ddRAD, ezRAD, GBS) for genotyping both mitochondrial and plastid loci, which were subject to phylogenetic tree reconstruction. In all cases, organellar phylogenies adequately supplemented the original studies either by corroborating the large-scale picture based on RADseq or by bringing additional evidence on past or contemporary hybridization. RADseq methods designed to achieve a larger horizontal coverage (i.e. ddRAD, ezRAD) evidently yielded longer organellar alignments, but sdRAD and GBS still provided useful polymorphic loci found in the cytoplasmic DNA. Our newly developed pipeline for the above purpose can be run under a Unix-line operating system and is freely accessible at https://github.com/laczkol/RADOrgMiner

Download Full-text

Genomic Data Reveal Conserved Female Heterogamety in Giant Salamanders with Gigantic Nuclear Genomes

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400556 ◽

2019 ◽

Vol 9 (10) ◽

pp. 3467-3476 ◽

Cited By ~ 2

Author(s):

Paul M. Hime ◽

Jeffrey T. Briggler ◽

Joshua S. Reece ◽

David W. Weisrock

Keyword(s):

Sex Determination ◽

Sex Chromosomes ◽

Sex Chromosome ◽

Model Organisms ◽

Reduced Representation ◽

Non Invasive ◽

Female Heterogamety ◽

Genome Wide ◽

Great Utility ◽

Genomic Regions

Systems of genetic sex determination and the homology of sex chromosomes in different taxa vary greatly across vertebrates. Much progress remains to be made in understanding systems of genetic sex determination in non-model organisms, especially those with homomorphic sex chromosomes and/or large genomes. We used reduced representation genome sequencing to investigate genetic sex determination systems in the salamander family Cryptobranchidae (genera Cryptobranchus and Andrias), which typifies both of these inherent difficulties. We tested hypotheses of male- or female-heterogamety by sequencing hundreds of thousands of anonymous genomic regions in a panel of known-sex cryptobranchids and characterized patterns of presence/absence, inferred zygosity, and depth of coverage to identify sex-linked regions of these 56 gigabase genomes. Our results strongly support the hypothesis that all cryptobranchid species possess homologous systems of female heterogamety, despite maintenance of homomorphic sex chromosomes over nearly 60 million years. Additionally, we report a robust, non-invasive genetic assay for sex diagnosis in Cryptobranchus and Andrias which may have great utility for conservation efforts with these endangered salamanders. Co-amplification of these W-linked markers in both cryptobranchid genera provides evidence for long-term sex chromosome stasis in one of the most divergent salamander lineages. These findings inform hypotheses about the ancestral mode of sex determination in salamanders, but suggest that comparative data from other salamander families are needed. Our results further demonstrate that massive genomes are not necessarily a barrier to effective genome-wide sequencing and that the resulting data can be highly informative about sex determination systems in taxa with homomorphic sex chromosomes.

Download Full-text

Integration of high-density genetic mapping with transcriptome analysis uncovers numerous agronomic QTL and reveals candidate genes for the control of tillering in sorghum

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab024 ◽

2021 ◽

Author(s):

Rajanikanth Govindarajulu ◽

Ashley N Hostetler ◽

Yuguo Xiao ◽

Srinivasa R Chaluvadi ◽

Margarita Mauro-Herrera ◽

...

Keyword(s):

Agronomic Traits ◽

Interspecific Cross ◽

Crop Improvement ◽

Differentially Expressed ◽

Plant Domestication ◽

Sorghum Propinquum ◽

Ril Population ◽

Trait Locus ◽

Genomic Regions ◽

Low Coverage

Abstract Phenotypes such as branching, photoperiod sensitivity, and height were modified during plant domestication and crop improvement. Here, we perform quantitative trait locus (QTL) mapping of these and other agronomic traits in a recombinant inbred line (RIL) population derived from an interspecific cross between Sorghum propinquum and Sorghum bicolor inbred Tx7000. Using low-coverage Illumina sequencing and a bin-mapping approach, we generated ∼1920 bin markers spanning ∼875 cM. Phenotyping data were collected and analyzed from two field locations and one greenhouse experiment for six agronomic traits, thereby identifying a total of 30 QTL. Many of these QTL were penetrant across environments and co-mapped with major QTL identified in other studies. Other QTL uncovered new genomic regions associated with these traits, and some of these were environment-specific in their action. To further dissect the genetic underpinnings of tillering, we complemented QTL analysis with transcriptomics, identifying 6189 genes that were differentially expressed during tiller bud elongation. We identified genes such as Dormancy Associated Protein 1 (DRM1) in addition to various transcription factors that are differentially expressed in comparisons of dormant to elongating tiller buds and lie within tillering QTL, suggesting that these genes are key regulators of tiller elongation in sorghum. Our study demonstrates the usefulness of this RIL population in detecting domestication and improvement-associated genes in sorghum, thus providing a valuable resource for genetic investigation and improvement to the sorghum community.

Download Full-text

Life history traits of adults and embryos of the Antarctic midge Belgica antarctica

Polar Biology ◽

10.1007/s00300-014-1511-0 ◽

2014 ◽

Vol 37 (8) ◽

pp. 1213-1217 ◽

Cited By ~ 8

Author(s):

Eri Harada ◽

Richard E. Lee ◽

David L. Denlinger ◽

Shin G. Goto

Keyword(s):

Life History ◽

Life History Traits ◽

Belgica Antarctica ◽

The Antarctic

Download Full-text

An Eye for the Extreme: Photoreceptor Fine-Structure in the Antarctic Midge Belgica antarctica (Diptera: Chironomidae)

Applied Entomology and Zoology ◽

10.1303/aez.31.629 ◽

1996 ◽

Vol 31 (4) ◽

pp. 629-632 ◽

Cited By ~ 1

Author(s):

Benno V. MEYER-ROCHOW ◽

A. Walto REID

Keyword(s):

Fine Structure ◽

Belgica Antarctica ◽

The Antarctic

Download Full-text

Rapture-ready darters: choice of reference genome and genotyping method (whole-genome or sequence capture) influence population genomic inference in Etheostoma

10.1101/2020.05.21.108274 ◽

2020 ◽

Author(s):

Brendan N. Reid ◽

Rachel L. Moran ◽

Christopher J. Kopack ◽

Sarah W. Fitzpatrick

Keyword(s):

Reference Genome ◽

Sequence Data ◽

Low Cost ◽

Read Depth ◽

Model Organisms ◽

Whole Genome ◽

Reduced Representation ◽

Sequence Capture ◽

Population Genomic ◽

The Impact

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.

Download Full-text