scholarly journals Identifying Conserved Genomic Elements and Designing Universal Probe Sets To Enrich Them

2016 ◽  
Author(s):  
Brant C. Faircloth

AbstractTargeted enrichment of conserved genomic regions is a popular method for collecting large amounts of sequence data from non-model taxa for phylogenetic, phylogeographic, and population genetic studies. Yet, few open-source workflows are available to identify conserved genomic elements shared among divergent taxa and to design enrichment baits targeting these regions. These shortcomings limit the application of targeted enrichment methods to many organismal groups. Here, I describe a universal workflow for identifying conserved genomic regions in available genomic data and for designing targeted enrichment baits to collect data from these conserved regions. I demonstrate how this computational approach can be applied to diverse organismal groups by identifying sets of conserved loci and designing enrichment baits targeting thousands of these loci in the understudied arthropod groups Arachnida, Coleoptera, Diptera, Hemiptera, or Lepidoptera. I then use in silico analyses to demonstrate that these conserved loci reconstruct the accepted relationships among genome sequences from the focal arthropod orders, and we perform in vitro validation of the Arachnid probe set as part of a separate manuscript (Starrett et al. Submitted). All of the documentation, design steps, software code, and probe sets developed here are available under an open-source license for restriction-free testing and use by any research group, and although the examples in this manuscript focus on understudied and exceptionally diverse arthropod groups, the software workflow is applicable to all organismal groups having some form of pre-existing genomic information.

2016 ◽  
Author(s):  
James Starrett ◽  
Shahan Derkarabetian ◽  
Marshal Hedin ◽  
Robert W. Bryson ◽  
John E. McCormack ◽  
...  

AbstractArachnida is an ancient, diverse, and ecologically important animal group that contains a number of species of interest for medical, agricultural, and engineering applications. Despite this applied importance, many aspects of the arachnid tree of life remain unresolved, hindering comparative approaches to arachnid biology. Biologists have made considerable efforts to resolve the arachnid phylogeny; yet, limited and challenging morphological characters, as well as a dearth of genetic resources, have confounded these attempts. Here, we present a genomic toolkit for arachnids featuring hundreds of conserved DNA regions (ultraconserved elements or UCEs) that allow targeted sequencing of any species in the arachnid tree of life. We used recently developed capture probes designed from conserved genomic regions of available arachnid genomes to enrich a sample of loci from 32 diverse arachnids. Sequence capture returned an average of 487 UCE loci for all species, with a range from 170 to 722. Phylogenetic analysis of these UCEs produced a highly resolved arachnid tree with relationships largely consistent with recent transcriptome-based phylogenies. We also tested the phylogenetic informativeness of UCE probes within the spider, scorpion, and harvestman orders, demonstrating the utility of these markers at shallower taxonomic scales, even down to the level of species differences. This probe set will open the door to phylogenomic and population genomic studies across the arachnid tree of life, enabling systematics, species delimitation, species discovery, and conservation of these diverse arthropods.


Animals ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 3208
Author(s):  
J. Kor Oldenbroek

The conservation of genetic diversity, both among and within breeds, is a costly process. Therefore, choices between breeds and animals within breeds are unavoidable, either for conservation in vitro (gene banks) or in vivo (maintaining small populations alive). Nowadays, genomic information on breeds and individual animals is the standard for the choices to be made in conservation. Genomics may accurately measure the genetic distances among breeds and the relationships among animals within breeds. Homozygosity at loci and at parts of chromosomes is used to measure inbreeding. In addition, genomics can be used to detect potentially valuable rare alleles and haplotypes, their carriers in these breeds and can facilitate in vivo or in vitro conservations of these genomic regions.


2020 ◽  
Author(s):  
Abeer F. El Nahas ◽  
Nasema M. Elkatatny ◽  
Haitham G. Abo-Al-Ela

Abstract SARS-CoV-2 has rapidly spread around the world. Several mutations have been detected in its genome, but they do not seem to affect the abilities of the virus to spread or infect. We aimed to explore the conserved genomic regions in coronavirus that could contain the key strengths of the virus. SARS-CoV-2 sequence data were retrieved from Genbank from the period of December 2019 to March 2020. Phylogenetic analyses were conducted for 207 sequences using MEGAX compared with the reference sequence (MN908947.3- CHN-Wuhan Dec-2019). The analysis included seven important genomic regions, the ORF1ab gene (21,290 bp), S gene (3,822 bp), Orf3a gene (827 bp), E gene (227 bp), M gene (669 bp), and N gene (1,259 bp), which play critical roles in virus invasion and replication. Furthermore, the variant nucleotides and amino acids were detected by MEGAX and BLAST. Through the phylogenetic analysis and amino acid substitution, the ORF1ab gene showed 11 conserved regions and also several variable sites. The E and M genes were mainly conserved, and all sequences were included in one clade, with one or two amino acid variants. Orf3a and the N gene have four conserved sites distributed along the genes. The S gene has 12 mutations and four main large conserved regionsWe conclude that the favored occurrence of mutations at the ORFab and Orf3a genes during the SARS-CoV epidemic is an important mechanism for virus pathogenesis. The E and M proteins have an almost conserved structure, whereas the S and N genes have many conserved regions, which could serve as possible targets for vaccine design for SARS-CoV.


Author(s):  
Frieder Hadlich ◽  
Henry Reyer ◽  
Michael Oster ◽  
Nares Trakooljul ◽  
Eduard Muráni ◽  
...  

AbstractCommercial and customized microarrays are valuable tools for the analysis of holistic expression patterns, but require the integration of the latest genomic information. This study provides a comprehensive workflow implemented in an R package (rePROBE) to assign the entire probes and to annotate the probe sets based on up-to-date genomic and transcriptomic information. The rePROBE R package is freely available at https://github.com/friederhadlich/rePROBE. It can be applied to available gene expression microarray platforms and addresses both public and custom databases. The revised probe assignment and updated probe-set annotation were applied to commercial microarrays available for different livestock species, i.e. ChiGene-1_0-st (Gallus gallus, 443,579 probes; 18,530 probe sets), PorGene-1_1-st (Sus scrofa, 592,005; 25,779) and BovGene-1_0-st (Bos taurus, 530,717; 24,759) as well as human (Homo sapiens, HuGene-1_0-st) and mouse (Mus musculus, HT_MG-430_PM) microarrays. Using current specie-specific transcriptomic information (RefSeq, Ensembl and partially non-redundant nucleotide sequences) and genomic information, the applied workflow revealed 297,574 probes for chickens (pig: 384,715; cattle: 363,077; human: 481,168; mouse: 324,942) assigned to 15,689 probe sets (pig: 21,673; cattle: 21,238; human: 23,495; mouse: 32,494). These are representative of 12,641 unique genes that were both annotated and positioned (pig: 15,758; cattle: 18,046; human: 20,167; mouse: 16,335). Additionally, the workflow collects information on the number of single nucleotide polymorphisms (SNPs) within respective targeted genomic regions and thus provides a detailed basis for comprehensive analyses such as quantitative trait locus (eQTL) expression studies to identify quantitative and functional traits.


2020 ◽  
Author(s):  
Abeer F. El Nahas ◽  
Nasema M. Elkatatny ◽  
Haitham G. Abo-Al-Ela

Abstract SARS-CoV-2 has rapidly spread around the world. Several mutations have been detected in its genome, but they do not seem to affect the abilities of the virus to spread or infect. We aimed to explore the conserved genomic regions in coronavirus that could contain the key strengths of the virus. SARS-CoV-2 sequence data were retrieved from Genbank from the period of December 2019 to March 2020. Phylogenetic analyses were conducted for 207 sequences using MEGAX compared with the reference sequence (MN908947.3- CHN-Wuhan Dec-2019). The analysis included seven important genomic regions, the ORF1ab gene (21,290 bp), S gene (3,822 bp), Orf3a gene (827 bp), E gene (227 bp), M gene (669 bp), and N gene (1,259 bp), which play critical roles in virus invasion and replication. Furthermore, the variant nucleotides and amino acids were detected by MEGAX and BLAST. Through the phylogenetic analysis and amino acid substitution, the ORF1ab gene showed 11 conserved regions and also several variable sites. The E and M genes were mainly conserved, and all sequences were included in one clade, with one or two amino acid variants. Orf3a and the N gene have four conserved sites distributed along the genes. The S gene has 12 mutations and four main large conserved regionsWe conclude that the favored occurrence of mutations at the ORFab and Orf3a genes during the SARS-CoV epidemic is an important mechanism for virus pathogenesis. The E and M proteins have an almost conserved structure, whereas the S and N genes have many conserved regions, which could serve as possible targets for vaccine design for SARS-CoV.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Eleanor F. Miller ◽  
Andrea Manica

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.


Plant Methods ◽  
2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Peio Ziarsolo ◽  
Tomas Hasing ◽  
Rebeca Hilario ◽  
Victor Garcia-Carpintero ◽  
Jose Blanca ◽  
...  

Abstract Background K-seq, a new genotyping methodology based on the amplification of genomic regions using two steps of Klenow amplification with short oligonucleotides, followed by standard PCR and Illumina sequencing, is presented. The protocol was accompanied by software developed to aid with primer set design. Results As the first examples, K-seq in species as diverse as tomato, dog and wheat was developed. K-seq provided genetic distances similar to those based on WGS in dogs. Experiments comparing K-seq and GBS in tomato showed similar genetic results, although K-seq had the advantage of finding more SNPs for the same number of Illumina reads. The technology reproducibility was tested with two independent runs of the tomato samples, and the correlation coefficient of the SNP coverages between samples was 0.8 and the genotype match was above 94%. K-seq also proved to be useful in polyploid species. The wheat samples generated specific markers for all subgenomes, and the SNPs generated from the diploid ancestors were located in the expected subgenome with accuracies greater than 80%. Conclusion K-seq is an open, patent-unencumbered, easy-to-set-up, cost-effective and reliable technology ready to be used by any molecular biology laboratory without special equipment in many genetic studies.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 25-25
Author(s):  
Muhammad Yasir Nawaz ◽  
Rodrigo Pelicioni Savegnago ◽  
Cedric Gondro

Abstract In this study, we detected genome wide footprints of selection in Hanwoo and Angus beef cattle using different allele frequency and haplotype-based methods based on imputed whole genome sequence data. Our dataset included 13,202 Angus and 10,437 Hanwoo animals with 10,057,633 and 13,241,550 imputed SNPs, respectively. A subset of data with 6,873,624 common SNPs between the two populations was used to estimate signatures of selection parameters, both within (runs of homozygosity and extended haplotype homozygosity) and between (allele fixation index, extended haplotype homozygosity) the breeds in order to infer evidence of selection. We observed that correlations between various measures of selection ranged between 0.01 to 0.42. Assuming these parameters were complementary to each other, we combined them into a composite selection signal to identify regions under selection in both beef breeds. The composite signal was based on the average of fractional ranks of individual selection measures for every SNP. We identified some selection signatures that were common between the breeds while others were independent. We also observed that more genomic regions were selected in Angus as compared to Hanwoo. Candidate genes within significant genomic regions may help explain mechanisms of adaptation, domestication history and loci for important traits in Angus and Hanwoo cattle. In the future, we will use the top SNPs under selection for genomic prediction of carcass traits in both breeds.


1972 ◽  
Vol 21 (1-2) ◽  
pp. 21-52 ◽  
Author(s):  
Bernardo Beiguelman

SummaryThe present paper reviews the research lines which have been explored to evaluate to what extent genetic factors are intervening on the mechanism of resistance and susceptibility to leprosy.It presents a critical discussion of the investigations on the familial association of leprosy, familial association of leprosy types, intrafamilial contagion of leprosy, concordance of leprosy in twinpairs, racial differences on leprosy prevalence and lepromatous rate, pedigree studies, association of leprosy to genetic markers, Australia antigen, and dermatoglyphic patterns. Space was also allotted to review family and twin-pair studies on the Mitsuda reaction, as well as to the investigation on the in vitro behaviour of blood macrophages against killed M. leprae.Some areas in which further research on leprosy and genetics may be considered as prioritary are outlined with some detail.


2006 ◽  
Vol 395 (3) ◽  
pp. 587-598 ◽  
Author(s):  
Ramin Nazarian ◽  
Marta Starcevic ◽  
Melissa J. Spencer ◽  
Esteban C. Dell'Angelica

Dysbindin was identified as a dystrobrevin-binding protein potentially involved in the pathogenesis of muscular dystrophy. Subsequently, genetic studies have implicated variants of the human dysbindin-encoding gene, DTNBP1, in the pathogeneses of Hermansky–Pudlak syndrome and schizophrenia. The protein is a stable component of a multisubunit complex termed BLOC-1 (biogenesis of lysosome-related organelles complex-1). In the present study, the significance of the dystrobrevin–dysbindin interaction for BLOC-1 function was examined. Yeast two-hybrid analyses, and binding assays using recombinant proteins, demonstrated direct interaction involving coiled-coil-forming regions in both dysbindin and the dystrobrevins. However, recombinant proteins bearing the coiled-coil-forming regions of the dystrobrevins failed to bind endogenous BLOC-1 from HeLa cells or mouse brain or muscle, under conditions in which they bound the Dp71 isoform of dystrophin. Immunoprecipitation of endogenous dysbindin from brain or muscle resulted in robust co-immunoprecipitation of the pallidin subunit of BLOC-1 but no specific co-immunoprecipitation of dystrobrevin isoforms. Within BLOC-1, dysbindin is engaged in interactions with three other subunits, named pallidin, snapin and muted. We herein provide evidence that the same 69-residue region of dysbindin that is sufficient for dystrobrevin binding in vitro also contains the binding sites for pallidin and snapin, and at least part of the muted-binding interface. Functional, histological and immunohistochemical analyses failed to detect any sign of muscle pathology in BLOC-1-deficient, homozygous pallid mice. Taken together, these results suggest that dysbindin assembled into BLOC-1 is not a physiological binding partner of the dystrobrevins, likely due to engagement of its dystrobrevin-binding region in interactions with other subunits.


Sign in / Sign up

Export Citation Format

Share Document