orthoCapture: Facilitating Gene Capture Probe Design for Non-Model Species
AbstractIn non-model species, targeted gene capture (selective enrichment of specific genomic regions of interest) applications in molecular ecology have been limited by the practicalities of capture design. Currently, the minimal requirement for designing capture probes is a transcriptome, or established reference genome for the species of interest. When an established, annotated reference genome is unavailable, one common approach is to design probes from annotated reference genomes (or transcriptomes) of related species. Unfortunately, as divergence between probes and the genome of interest increases, such as occurs during directional selection, capture performance decreases. Here I introduce orthoCapture, a tool to overcome such limitations by mining unannotated whole-genome sequence (WGS) data from non-model species and/or their close relatives to allow probe design using multiple genomic sources. orthoCapture finds orthologs in WGS data from multiple related species to create a set of exon sequences that encompasses the diversity of the exons of interest. These “design sequences” can then be used to design capture probes for the species of interest. orthoCapture thus eliminates the need for transcriptome or whole-genome sequencing for bait capture experiments, making this technique accessible for molecular ecology and conservation studies. Use of orthoCapture is via command-line interface on Unix systems, and requires the input of a gene sequence from an unrelated annotated genome and a fasta database from a target, unannotated genome (e.g., whole-genome shotgun contigs). The output, sequence templates from the nonannotated genomic data, allows probe creation by any commercial company providing gene capture services.