scholarly journals orthoCapture: Facilitating Gene Capture Probe Design for Non-Model Species

2019 ◽  
Author(s):  
M. Elise Lauterbur

AbstractIn non-model species, targeted gene capture (selective enrichment of specific genomic regions of interest) applications in molecular ecology have been limited by the practicalities of capture design. Currently, the minimal requirement for designing capture probes is a transcriptome, or established reference genome for the species of interest. When an established, annotated reference genome is unavailable, one common approach is to design probes from annotated reference genomes (or transcriptomes) of related species. Unfortunately, as divergence between probes and the genome of interest increases, such as occurs during directional selection, capture performance decreases. Here I introduce orthoCapture, a tool to overcome such limitations by mining unannotated whole-genome sequence (WGS) data from non-model species and/or their close relatives to allow probe design using multiple genomic sources. orthoCapture finds orthologs in WGS data from multiple related species to create a set of exon sequences that encompasses the diversity of the exons of interest. These “design sequences” can then be used to design capture probes for the species of interest. orthoCapture thus eliminates the need for transcriptome or whole-genome sequencing for bait capture experiments, making this technique accessible for molecular ecology and conservation studies. Use of orthoCapture is via command-line interface on Unix systems, and requires the input of a gene sequence from an unrelated annotated genome and a fasta database from a target, unannotated genome (e.g., whole-genome shotgun contigs). The output, sequence templates from the nonannotated genomic data, allows probe creation by any commercial company providing gene capture services.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 751
Author(s):  
Kathleen O'Neill ◽  
Stacy Pirro

The Sweetleaf (Stevia rebaudiana: Asteraceae) is widely grown for use as a sweetener.  We present the whole genome sequence and annotation of this species.  A total of 146,838,888 paired-end reads consisting of 22.2G bases were obtained by sequencing one leaf from a commercially grown seedling.  The reads were assembled by a de-novo method followed by alignment to related species.   Annotation was performed via GenMark-ES. The raw and assembled data is publicly available via GenBank: Sequence Read Archive (SRR6792730) and Assembly (GCA_009936405).


Author(s):  
Fahao Wang ◽  
Jiahui Qi ◽  
Miao Tian ◽  
Yizhou Gao ◽  
Xiaohui Xiong ◽  
...  

Gummy stem blight (GSB), which is caused by three related species of Stagonosporopsis, is a worldwide devastating disease of cucurbit crops including watermelon. Previously S. cucurbitacearum was reported to be the major fungal cause of watermelon GSB in Southern China, where it causes a significant decrease in watermelon yield. Here, we present the draft whole genome sequence, gene prediction and annotation of S. cucurbitacearum strain DBTL4, isolated from diseased watermelon plants. To our knowledge, this is the first publicly available genome sequence of this species, and knowledge of this genome sequence will help further understand the pathogenic mechanism of S. cucurbitacearum to cucurbit plants.


2021 ◽  
Author(s):  
Josue Chinchilla-Vargas ◽  
Max F. Rothschild ◽  
Francesca Bertolini

Abstract Background Muskellunge (Esox masquinongy) is the largest and most prized game fish for anglers in North America. However, little is known about Muskellunge genetic diversity in Iowa’s propagation program. We used whole genome sequence from 12 brooding individuals from Iowa and publicly available RAD-seq of 625 individuals from Saint-Lawrence river in Canada to study the genetic differences between populations, analyze signatures of selection that might shed light on environmental adaptations, and evaluate the levels of genetic diversity in both populations. Given that there is no reference genome available for Muskellunge, reads were aligned to the genome of Pike (Esox lucius), a closely-related species. Results Variant calling produced 7,886,471 biallelic variants for the Iowa population and 16,867 high quality SNPs that overlap with the Canadian samples. The Ti/Tv values were 1.09 and 1.29 for samples from Iowa and Canada, respectively. PCA and Admixture analyses showed a large genetic difference between Canadian and Iowan populations. Moreover, PCA showed a clustering by sex in the Iowan population although widow-based Fst did not find outlier regions. Window based pooled heterozygosity found 6 highly heterozygous windows containing 244 genes in the Iowa population and Fst comparing the Iowa and Canadian populations found 14 windows with Fst values larger than 0.9 containing 641 genes. One enriched GO term (sensory perception of pain) was found through pooled heterozygosity analyzes. Although not significant, several enriched GO terms associated to growth and development were found through Fst analyses. Inbreeding calculated as Froh was 0.03 on average for the Iowa population and 0.32 on average for the Canadian samples. The inbreeding rate appears is presumably due to isolation of subpopulations. Conclusions This study is the first of its kind in Muskellunge from Iowa in which captured brood stock showed marked genetic differences with the Canadian population. Additionally, despite genetic differentiation based on sex has been observed, no major locus has been detected . Inbreeding does not seem to be an immediate concern for Muskellunge in Iowa, isolation of subpopulations has caused levels of homozygosity to increase in the Canadian Muskellunge population. These results prove the validity of using genomes of closely related species to perform genomic analyses when no reference genome assembly is available.


2021 ◽  
Author(s):  
Josue Chinchilla-Vargas ◽  
Jonathan Meerbeek ◽  
Max F. Rothschild ◽  
Francesca Bertolini

Abstract Background Muskellunge (Esox masquinongy) is the largest and most prized game fish for anglers in North America. However, little is known about Muskellunge genetic diversity in Iowa’s propagation program. We used whole genome sequence from 12 brooding individuals from Iowa and publicly available RAD-seq of 625 individuals from Saint-Lawrence river in Canada to study the genetic differences between populations, analyze signatures of selection that might shed light on environmental adaptations, and evaluate the levels of genetic diversity in both populations. Given that there is no reference genome available for Muskellunge, reads were aligned to the genome of Pike (Esox lucius), a closely-related species.ResultsVariant calling produced 7,886,471 biallelic variants for the Iowa population and 16,867 high quality SNPs that overlap with the Canadian samples. The Ti/Tv values were 1.09 and 1.29 for samples from Iowa and Canada, respectively. PCA and Admixture analyses showed a large genetic difference between Canadian and Iowan populations. Moreover, PCA showed a clustering by sex in the Iowan population although widow-based Fst did not find outlier regions. Window based pooled heterozygosity found 6 highly heterozygous windows containing 244 genes in the Iowa population and Fst comparing the Iowa and Canadian populations found 14 windows with Fst values larger than 0.9 containing 641 genes. One enriched GO term (sensory perception of pain) was found through pooled heterozygosity analyzes. Although not significant, several enriched GO terms associated to growth and development were found through Fst analyses.Inbreeding calculated as Froh was 0.03 on average for the Iowa population and 0.32 on average for the Canadian samples. The Canadian inbreeding rate appears is presumably due to isolation of subpopulations.ConclusionsThis study is the first of its kind in Muskellunge from Iowa in which captured brood stock showed marked genetic differences with the Canadian population. Additionally, despite genetic differentiation based on sex has been observed, no major locus has been detected. Inbreeding does not seem to be an immediate concern for Muskellunge in Iowa, but apparent isolation of subpopulations has caused levels of homozygosity to increase in the Canadian Muskellunge population.These results prove the validity of using genomes of closely related species to perform genomic analyses when no reference genome assembly is available.


2017 ◽  
Author(s):  
Robert J. Schaefer ◽  
Mikkel Schubert ◽  
Ernest Bailey ◽  
Danika L. Bannasch ◽  
Eric Barrey ◽  
...  

AbstractBackgroundTo date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array.ResultsUsing whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ∼5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ∼2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ∼670 thousand SNPs (MNEc670k), was designed for genotype imputation.ConclusionsHere, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Gehendra Bhattarai ◽  
Ainong Shi ◽  
Devi R. Kandel ◽  
Nora Solís-Gracia ◽  
Jorge Alberto da Silva ◽  
...  

AbstractThe availability of well-assembled genome sequences and reduced sequencing costs have enabled the resequencing of many additional accessions in several crops, thus facilitating the rapid discovery and development of simple sequence repeat (SSR) markers. Although the genome sequence of inbred spinach line Sp75 is available, previous efforts have resulted in a limited number of useful SSR markers. Identification of additional polymorphic SSR markers will support genetics and breeding research in spinach. This study aimed to use the available genomic resources to mine and catalog a large number of polymorphic SSR markers. A search for SSR loci on six chromosome sequences of spinach line Sp75 using GMATA identified a total of 42,155 loci with repeat motifs of two to six nucleotides in the Sp75 reference genome. Whole-genome sequences (30x) of additional 21 accessions were aligned against the chromosome sequences of the reference genome and in silico genotyped using the HipSTR program by comparing and counting repeat numbers variation across the SSR loci among the accessions. The HipSTR program generated SSR genotype data were filtered for monomorphic and high missing loci, and a final set of the 5986 polymorphic SSR loci were identified. The polymorphic SSR loci were present at a density of 12.9 SSRs/Mb and were physically mapped. Out of 36 randomly selected SSR loci for validation, two failed to amplify, while the remaining were all polymorphic in a set of 48 spinach accessions from 34 countries. Genetic diversity analysis performed using the SSRs allele score data on the 48 spinach accessions showed three main population groups. This strategy to mine and develop polymorphic SSR markers by a comparative analysis of the genome sequences of multiple accessions and computational genotyping of the candidate SSR loci eliminates the need for laborious experimental screening. Our approach increased the efficiency of discovering a large set of novel polymorphic SSR markers, as demonstrated in this report.


2018 ◽  
Vol 31 (10) ◽  
pp. 979-981 ◽  
Author(s):  
Riccardo Baroncelli ◽  
Serenella A. Sukno ◽  
Sabrina Sarrocco ◽  
Giovanni Cafà ◽  
Gaetan Le Floch ◽  
...  

Colletotrichum orchidophilum is a plant-pathogenic fungus infecting a wide range of plant species belonging to the family Orchidaceae. In addition to its economic impact, C. orchidophilum has been used in recent years in evolutionary studies because it represents the closest related species to the C. acutatum species complex. Here, we present the first-draft whole-genome sequence of C. orchidophilum IMI 309357, providing a resource for future research on anthracnose of Orchidaceae and other hosts.


Sign in / Sign up

Export Citation Format

Share Document