orthoCapture: Facilitating Gene Capture Probe Design for Non-Model Species

Mapping Intimacies ◽

10.1101/703942 ◽

2019 ◽

Author(s):

M. Elise Lauterbur

Keyword(s):

Related Species ◽

Reference Genome ◽

Molecular Ecology ◽

Whole Genome Sequence ◽

Probe Design ◽

Whole Genome ◽

Model Species ◽

Gene Capture ◽

Selective Enrichment ◽

Close Relatives

AbstractIn non-model species, targeted gene capture (selective enrichment of specific genomic regions of interest) applications in molecular ecology have been limited by the practicalities of capture design. Currently, the minimal requirement for designing capture probes is a transcriptome, or established reference genome for the species of interest. When an established, annotated reference genome is unavailable, one common approach is to design probes from annotated reference genomes (or transcriptomes) of related species. Unfortunately, as divergence between probes and the genome of interest increases, such as occurs during directional selection, capture performance decreases. Here I introduce orthoCapture, a tool to overcome such limitations by mining unannotated whole-genome sequence (WGS) data from non-model species and/or their close relatives to allow probe design using multiple genomic sources. orthoCapture finds orthologs in WGS data from multiple related species to create a set of exon sequences that encompasses the diversity of the exons of interest. These “design sequences” can then be used to design capture probes for the species of interest. orthoCapture thus eliminates the need for transcriptome or whole-genome sequencing for bait capture experiments, making this technique accessible for molecular ecology and conservation studies. Use of orthoCapture is via command-line interface on Unix systems, and requires the input of a gene sequence from an unrelated annotated genome and a fasta database from a target, unannotated genome (e.g., whole-genome shotgun contigs). The output, sequence templates from the nonannotated genomic data, allows probe creation by any commercial company providing gene capture services.

Download Full-text

The complete genome sequence of Stevia rebaudiana, the Sweetleaf

F1000Research ◽

10.12688/f1000research.24396.1 ◽

2020 ◽

Vol 9 ◽

pp. 751

Author(s):

Kathleen O'Neill ◽

Stacy Pirro

Keyword(s):

Genome Sequence ◽

Complete Genome Sequence ◽

Related Species ◽

Complete Genome ◽

De Novo ◽

Stevia Rebaudiana ◽

Whole Genome Sequence ◽

Whole Genome ◽

Link Type ◽

Sequence Read Archive

The Sweetleaf (Stevia rebaudiana: Asteraceae) is widely grown for use as a sweetener. We present the whole genome sequence and annotation of this species. A total of 146,838,888 paired-end reads consisting of 22.2G bases were obtained by sequencing one leaf from a commercially grown seedling. The reads were assembled by a de-novo method followed by alignment to related species. Annotation was performed via GenMark-ES. The raw and assembled data is publicly available via GenBank: Sequence Read Archive (SRR6792730) and Assembly (GCA_009936405).

Download Full-text

Genome sequence resource for Stagonosporopsis cucurbitacearum, a cause of gummy stem blight disease of watermelon

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-02-21-0048-a ◽

2021 ◽

Author(s):

Fahao Wang ◽

Jiahui Qi ◽

Miao Tian ◽

Yizhou Gao ◽

Xiaohui Xiong ◽

...

Keyword(s):

Genome Sequence ◽

Related Species ◽

Southern China ◽

Gene Prediction ◽

Pathogenic Mechanism ◽

Whole Genome Sequence ◽

Whole Genome ◽

Stem Blight ◽

Gummy Stem Blight ◽

Blight Disease

Gummy stem blight (GSB), which is caused by three related species of Stagonosporopsis, is a worldwide devastating disease of cucurbit crops including watermelon. Previously S. cucurbitacearum was reported to be the major fungal cause of watermelon GSB in Southern China, where it causes a significant decrease in watermelon yield. Here, we present the draft whole genome sequence, gene prediction and annotation of S. cucurbitacearum strain DBTL4, isolated from diseased watermelon plants. To our knowledge, this is the first publicly available genome sequence of this species, and knowledge of this genome sequence will help further understand the pathogenic mechanism of S. cucurbitacearum to cucurbit plants.

Download Full-text

Signatures of selection and genomic diversity of Muskellunge (Esox masquinongy) from two populations in North America.

10.21203/rs.3.rs-264701/v1 ◽

2021 ◽

Author(s):

Josue Chinchilla-Vargas ◽

Max F. Rothschild ◽

Francesca Bertolini

Keyword(s):

Genetic Diversity ◽

North America ◽

Related Species ◽

Reference Genome ◽

Genetic Differences ◽

Genomic Diversity ◽

Whole Genome Sequence ◽

Closely Related Species ◽

Esox Masquinongy ◽

Signatures Of Selection

Abstract Background Muskellunge (Esox masquinongy) is the largest and most prized game fish for anglers in North America. However, little is known about Muskellunge genetic diversity in Iowa’s propagation program. We used whole genome sequence from 12 brooding individuals from Iowa and publicly available RAD-seq of 625 individuals from Saint-Lawrence river in Canada to study the genetic differences between populations, analyze signatures of selection that might shed light on environmental adaptations, and evaluate the levels of genetic diversity in both populations. Given that there is no reference genome available for Muskellunge, reads were aligned to the genome of Pike (Esox lucius), a closely-related species. Results Variant calling produced 7,886,471 biallelic variants for the Iowa population and 16,867 high quality SNPs that overlap with the Canadian samples. The Ti/Tv values were 1.09 and 1.29 for samples from Iowa and Canada, respectively. PCA and Admixture analyses showed a large genetic difference between Canadian and Iowan populations. Moreover, PCA showed a clustering by sex in the Iowan population although widow-based Fst did not find outlier regions. Window based pooled heterozygosity found 6 highly heterozygous windows containing 244 genes in the Iowa population and Fst comparing the Iowa and Canadian populations found 14 windows with Fst values larger than 0.9 containing 641 genes. One enriched GO term (sensory perception of pain) was found through pooled heterozygosity analyzes. Although not significant, several enriched GO terms associated to growth and development were found through Fst analyses. Inbreeding calculated as Froh was 0.03 on average for the Iowa population and 0.32 on average for the Canadian samples. The inbreeding rate appears is presumably due to isolation of subpopulations. Conclusions This study is the first of its kind in Muskellunge from Iowa in which captured brood stock showed marked genetic differences with the Canadian population. Additionally, despite genetic differentiation based on sex has been observed, no major locus has been detected . Inbreeding does not seem to be an immediate concern for Muskellunge in Iowa, isolation of subpopulations has caused levels of homozygosity to increase in the Canadian Muskellunge population. These results prove the validity of using genomes of closely related species to perform genomic analyses when no reference genome assembly is available.

Download Full-text

Signatures of selection and genomic diversity of Muskellunge (Esox masquinongy) from two populations in North America.

10.21203/rs.3.rs-264701/v2 ◽

2021 ◽

Author(s):

Josue Chinchilla-Vargas ◽

Jonathan Meerbeek ◽

Max F. Rothschild ◽

Francesca Bertolini

Keyword(s):

Genetic Diversity ◽

North America ◽

Related Species ◽

Reference Genome ◽

Genetic Differences ◽

Genomic Diversity ◽

Whole Genome Sequence ◽

Closely Related Species ◽

Esox Masquinongy ◽

Signatures Of Selection

Abstract Background Muskellunge (Esox masquinongy) is the largest and most prized game fish for anglers in North America. However, little is known about Muskellunge genetic diversity in Iowa’s propagation program. We used whole genome sequence from 12 brooding individuals from Iowa and publicly available RAD-seq of 625 individuals from Saint-Lawrence river in Canada to study the genetic differences between populations, analyze signatures of selection that might shed light on environmental adaptations, and evaluate the levels of genetic diversity in both populations. Given that there is no reference genome available for Muskellunge, reads were aligned to the genome of Pike (Esox lucius), a closely-related species.ResultsVariant calling produced 7,886,471 biallelic variants for the Iowa population and 16,867 high quality SNPs that overlap with the Canadian samples. The Ti/Tv values were 1.09 and 1.29 for samples from Iowa and Canada, respectively. PCA and Admixture analyses showed a large genetic difference between Canadian and Iowan populations. Moreover, PCA showed a clustering by sex in the Iowan population although widow-based Fst did not find outlier regions. Window based pooled heterozygosity found 6 highly heterozygous windows containing 244 genes in the Iowa population and Fst comparing the Iowa and Canadian populations found 14 windows with Fst values larger than 0.9 containing 641 genes. One enriched GO term (sensory perception of pain) was found through pooled heterozygosity analyzes. Although not significant, several enriched GO terms associated to growth and development were found through Fst analyses.Inbreeding calculated as Froh was 0.03 on average for the Iowa population and 0.32 on average for the Canadian samples. The Canadian inbreeding rate appears is presumably due to isolation of subpopulations.ConclusionsThis study is the first of its kind in Muskellunge from Iowa in which captured brood stock showed marked genetic differences with the Canadian population. Additionally, despite genetic differentiation based on sex has been observed, no major locus has been detected. Inbreeding does not seem to be an immediate concern for Muskellunge in Iowa, but apparent isolation of subpopulations has caused levels of homozygosity to increase in the Canadian Muskellunge population.These results prove the validity of using genomes of closely related species to perform genomic analyses when no reference genome assembly is available.

Download Full-text

Developing a 670k genotyping array to tag ∼2M SNPs across 24 horse breeds

10.1101/112979 ◽

2017 ◽

Author(s):

Robert J. Schaefer ◽

Mikkel Schubert ◽

Ernest Bailey ◽

Danika L. Bannasch ◽

Eric Barrey ◽

...

Keyword(s):

Genome Sequence ◽

High Density ◽

Genotype Imputation ◽

Whole Genome Sequence ◽

Probe Design ◽

Candidate Snps ◽

Genotype Data ◽

Domestic Horse ◽

Whole Genome ◽

Next Generation

AbstractBackgroundTo date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array.ResultsUsing whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ∼5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ∼2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ∼670 thousand SNPs (MNEc670k), was designed for genotype imputation.ConclusionsHere, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.

Download Full-text

Genome-wide simple sequence repeats (SSR) markers discovered from whole-genome sequence comparisons of multiple spinach accessions

Scientific Reports ◽

10.1038/s41598-021-89473-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gehendra Bhattarai ◽

Ainong Shi ◽

Devi R. Kandel ◽

Nora Solís-Gracia ◽

Jorge Alberto da Silva ◽

...

Keyword(s):

Ssr Markers ◽

Genome Sequence ◽

Reference Genome ◽

Whole Genome Sequence ◽

Large Set ◽

Whole Genome ◽

Genome Sequences ◽

Sequence Comparisons ◽

Ssr Loci ◽

Simple Sequence

AbstractThe availability of well-assembled genome sequences and reduced sequencing costs have enabled the resequencing of many additional accessions in several crops, thus facilitating the rapid discovery and development of simple sequence repeat (SSR) markers. Although the genome sequence of inbred spinach line Sp75 is available, previous efforts have resulted in a limited number of useful SSR markers. Identification of additional polymorphic SSR markers will support genetics and breeding research in spinach. This study aimed to use the available genomic resources to mine and catalog a large number of polymorphic SSR markers. A search for SSR loci on six chromosome sequences of spinach line Sp75 using GMATA identified a total of 42,155 loci with repeat motifs of two to six nucleotides in the Sp75 reference genome. Whole-genome sequences (30x) of additional 21 accessions were aligned against the chromosome sequences of the reference genome and in silico genotyped using the HipSTR program by comparing and counting repeat numbers variation across the SSR loci among the accessions. The HipSTR program generated SSR genotype data were filtered for monomorphic and high missing loci, and a final set of the 5986 polymorphic SSR loci were identified. The polymorphic SSR loci were present at a density of 12.9 SSRs/Mb and were physically mapped. Out of 36 randomly selected SSR loci for validation, two failed to amplify, while the remaining were all polymorphic in a set of 48 spinach accessions from 34 countries. Genetic diversity analysis performed using the SSRs allele score data on the 48 spinach accessions showed three main population groups. This strategy to mine and develop polymorphic SSR markers by a comparative analysis of the genome sequences of multiple accessions and computational genotyping of the candidate SSR loci eliminates the need for laborious experimental screening. Our approach increased the efficiency of discovering a large set of novel polymorphic SSR markers, as demonstrated in this report.

Download Full-text

Whole-Genome Sequence of the Orchid Anthracnose Pathogen Colletotrichum orchidophilum

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-03-18-0055-a ◽

2018 ◽

Vol 31 (10) ◽

pp. 979-981 ◽

Cited By ~ 7

Author(s):

Riccardo Baroncelli ◽

Serenella A. Sukno ◽

Sabrina Sarrocco ◽

Giovanni Cafà ◽

Gaetan Le Floch ◽

...

Keyword(s):

Genome Sequence ◽

Related Species ◽

Species Complex ◽

Pathogenic Fungus ◽

Whole Genome Sequence ◽

Future Research ◽

Whole Genome ◽

Wide Range ◽

The Family ◽

Evolutionary Studies

Colletotrichum orchidophilum is a plant-pathogenic fungus infecting a wide range of plant species belonging to the family Orchidaceae. In addition to its economic impact, C. orchidophilum has been used in recent years in evolutionary studies because it represents the closest related species to the C. acutatum species complex. Here, we present the first-draft whole-genome sequence of C. orchidophilum IMI 309357, providing a resource for future research on anthracnose of Orchidaceae and other hosts.

Download Full-text

30-OR: Whole Genome Sequence Analysis of Type 2 Diabetes Risk in 44,713 Humans of Diverse Ancestry in the TOPMed Study

Diabetes ◽

10.2337/db19-30-or ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 30-OR

Author(s):

HEATHER M. HIGHLAND ◽

JENNIFER WESSEL ◽

ALISA MANNING ◽

Keyword(s):

Type 2 Diabetes ◽

Sequence Analysis ◽

Genome Sequence ◽

Diabetes Risk ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Analysis

Download Full-text

Whole genome sequence comparison of endemic multi-resistant Escherichia coli clones

10.26226/morressier.56d5ba26d462b80296c94b2c ◽

2016 ◽

Author(s):

Tove Havnhøj Frandsen

Keyword(s):

Escherichia Coli ◽

Genome Sequence ◽

Sequence Comparison ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequence Comparison

Download Full-text

Faculty Opinions recommendation of Optimal algorithms for haplotype assembly from whole-genome sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13339986.14707085 ◽

2011 ◽

Author(s):

Alejandro Schaffer

Keyword(s):

Genome Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome ◽

Optimal Algorithms ◽

Genome Sequence Data ◽

Haplotype Assembly

Download Full-text