scholarly journals Assexon: Assembling Exon Using Gene Capture Data

2019 ◽  
Vol 15 ◽  
pp. 117693431987479 ◽  
Author(s):  
Hao Yuan ◽  
Calder Atta ◽  
Luke Tornabene ◽  
Chenhong Li

Exon capture across species has been one of the most broadly applied approaches to acquire multi-locus data in phylogenomic studies of non-model organisms. Methods for assembling loci from short-read sequences (eg, Illumina platforms) that rely on mapping reads to a reference genome may not be suitable for studies comprising species across a wide phylogenetic spectrum; thus, de novo assembling methods are more generally applied. Current approaches for assembling targeted exons from short reads are not particularly optimized as they cannot (1) assemble loci with low read depth, (2) handle large files efficiently, and (3) reliably address issues with paralogs. Thus, we present Assexon: a streamlined pipeline that de novo assembles targeted exons and their flanking sequences from raw reads. We tested our method using reads from Lepisosteus osseus (4.37 Gb) and Boleophthalmus pectinirostris (2.43 Gb), which are captured using baits that were designed based on genome sequence of Lepisosteus oculatus and Oreochromis niloticus, respectively. We compared performance of Assexon to PHYLUCE and HybPiper, which are commonly used pipelines to assemble ultra-conserved element (UCE) and Hyb-seq data. A custom exon capture analysis pipeline (CP) developed by Yuan et al was compared as well. Assexon accurately assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900 to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic divergence. Assexon ran at least twice as fast as PHYLUCE and HybPiper. Number of loci assembled using CP was comparable with Assexon in both tests, while Assexon ran at least 7 times faster than CP. In addition, some steps of CP require the user’s interaction and are not fully automated, and this user time was not counted in our calculation. Both Assexon and CP retrieved no paralogs in the testing runs, but PHYLUCE and Hybpiper did. In conclusion, Assexon is a tool for accurate and efficient assembling of large read sets from exon capture experiments. Furthermore, Assexon includes scripts to filter poorly aligned coding regions and flanking regions, calculate summary statistics of loci, and select loci with reliable phylogenetic signal. Assexon is available at https://github.com/yhadevol/Assexon .

1990 ◽  
Vol 10 (2) ◽  
pp. 518-527
Author(s):  
T de Lange ◽  
L Shiue ◽  
R M Myers ◽  
D R Cox ◽  
S L Naylor ◽  
...  

Mammalian telomeres are thought to be composed of a tandem array of TTAGGG repeats. To further define the type and arrangement of sequences at the ends of human chromosomes, we developed a direct cloning strategy for telomere-associated DNA. The method involves a telomere enrichment procedure based on the relative lack of restriction endonuclease cutting sites near the ends of human chromosomes. Nineteen (TTAGGG)n-bearing plasmids were isolated, two of which contain additional human sequences proximal to the telomeric repeats. These telomere-flanking sequences detect BAL 31-sensitive loci and thus are located close to chromosome ends. One of the flanking regions is part of a subtelomeric repeat that is present at 10 to 25% of the chromosome ends in the human genome. This sequence is not conserved in rodent DNA and therefore should be a helpful tool for physical characterization of human chromosomes in human-rodent hybrid cell lines; some of the chromosomes that may be analyzed in this manner have been identified, i.e., 7, 16, 17, and 21. The minimal size of the subtelomeric repeat is 4 kilobases (kb); it shows a high frequency of restriction fragment length polymorphisms and undergoes extensive de novo methylation in somatic cells. Distal to the subtelomeric repeat, the chromosomes terminate in a long region (up to 14 kb) that may be entirely composed of TTAGGG repeats. This terminal segment is unusually variable. Although sperm telomeres are 10 to 14 kb long, telomeres in somatic cells are several kilobase pairs shorter and very heterogeneous in length. Additional telomere reduction occurs in primary tumors, indicating that somatic telomeres are unstable and may continuously lose sequences from their termini.


1990 ◽  
Vol 10 (2) ◽  
pp. 518-527 ◽  
Author(s):  
T de Lange ◽  
L Shiue ◽  
R M Myers ◽  
D R Cox ◽  
S L Naylor ◽  
...  

Mammalian telomeres are thought to be composed of a tandem array of TTAGGG repeats. To further define the type and arrangement of sequences at the ends of human chromosomes, we developed a direct cloning strategy for telomere-associated DNA. The method involves a telomere enrichment procedure based on the relative lack of restriction endonuclease cutting sites near the ends of human chromosomes. Nineteen (TTAGGG)n-bearing plasmids were isolated, two of which contain additional human sequences proximal to the telomeric repeats. These telomere-flanking sequences detect BAL 31-sensitive loci and thus are located close to chromosome ends. One of the flanking regions is part of a subtelomeric repeat that is present at 10 to 25% of the chromosome ends in the human genome. This sequence is not conserved in rodent DNA and therefore should be a helpful tool for physical characterization of human chromosomes in human-rodent hybrid cell lines; some of the chromosomes that may be analyzed in this manner have been identified, i.e., 7, 16, 17, and 21. The minimal size of the subtelomeric repeat is 4 kilobases (kb); it shows a high frequency of restriction fragment length polymorphisms and undergoes extensive de novo methylation in somatic cells. Distal to the subtelomeric repeat, the chromosomes terminate in a long region (up to 14 kb) that may be entirely composed of TTAGGG repeats. This terminal segment is unusually variable. Although sperm telomeres are 10 to 14 kb long, telomeres in somatic cells are several kilobase pairs shorter and very heterogeneous in length. Additional telomere reduction occurs in primary tumors, indicating that somatic telomeres are unstable and may continuously lose sequences from their termini.


Author(s):  
John Martinson ◽  
David C. Bencic ◽  
Gregory P. Toth ◽  
Mitchell S. Kostich ◽  
Robert W. Flick ◽  
...  

ABSTRACTThe Fathead Minnow (FHM) is one of the most important and widely used model organisms in aquatic toxicology. The lack of a high-quality and well-annotated FHM reference genome, however, has severely hampered the efforts using modem ‘omics approaches with FHM for environmental toxicogenomics studies. We present here a de novo assembled and nearly complete reference of the fathead minnow genome. Compared to the current fragmented and sparsely annotated FHM genome assembly (FHM1), the new highly contiguous and well-annotated FHM reference genome (FHM2) represents a major improvement, having 95.1% of the complete BUSCOs (Benchmarking Universal Single-Copy Orthologs) and a scaffold N50 of 12.0 Mbps. The completeness of gene annotation for the FHM2 reference genome was demonstrated to be comparable to that of the zebrafish (ZF) GRCz11 reference genome. In addition, our comparative genomics analyses between FHM and ZF revealed highly conserved coding regions between two species while discovering much more compact gene structure in FHM than ZF. This study not only provides insights for assembling a highly repetitive AT-rich genome, but also delivers a critical genomic resource essential for toxicogenomics studies in environmental toxicology.


Author(s):  
José Cerca ◽  
Marius F. Maurstad ◽  
Nicolas C. Rochette ◽  
Angel G. Rivera‐Colón ◽  
Niraj Rayamajhi ◽  
...  
Keyword(s):  
De Novo ◽  

Animals ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 2226
Author(s):  
Sazia Kunvar ◽  
Sylwia Czarnomska ◽  
Cino Pertoldi ◽  
Małgorzata Tokarska

The European bison is a non-model organism; thus, most of its genetic and genomic analyses have been performed using cattle-specific resources, such as BovineSNP50 BeadChip or Illumina Bovine 800 K HD Bead Chip. The problem with non-specific tools is the potential loss of evolutionary diversified information (ascertainment bias) and species-specific markers. Here, we have used a genotyping-by-sequencing (GBS) approach for genotyping 256 samples from the European bison population in Bialowieza Forest (Poland) and performed an analysis using two integrated pipelines of the STACKS software: one is de novo (without reference genome) and the other is a reference pipeline (with reference genome). Moreover, we used a reference pipeline with two different genomes, i.e., Bos taurus and European bison. Genotyping by sequencing (GBS) is a useful tool for SNP genotyping in non-model organisms due to its cost effectiveness. Our results support GBS with a reference pipeline without PCR duplicates as a powerful approach for studying the population structure and genotyping data of non-model organisms. We found more polymorphic markers in the reference pipeline in comparison to the de novo pipeline. The decreased number of SNPs from the de novo pipeline could be due to the extremely low level of heterozygosity in European bison. It has been confirmed that all the de novo/Bos taurus and Bos taurus reference pipeline obtained SNPs were unique and not included in 800 K BovineHD BeadChip.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Daniel Stribling ◽  
Peter L. Chang ◽  
Justin E. Dalton ◽  
Christopher A. Conow ◽  
Malcolm Rosenthal ◽  
...  

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.


Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1342
Author(s):  
Shaghayegh Mehravi ◽  
Gholam Ali Ranjbar ◽  
Ghader Mirzaghaderi ◽  
Anita Alice Severn-Ellis ◽  
Armin Scheben ◽  
...  

The species of Pimpinella, one of the largest genera of the family Apiaceae, are traditionally cultivated for medicinal purposes. In this study, high-throughput double digest restriction-site associated DNA sequencing technology (ddRAD-seq) was used to identify single nucleotide polymorphisms (SNPs) in eight Pimpinella species from Iran. After double-digestion with the enzymes HpyCH4IV and HinfI, a total of 334,702,966 paired-end reads were de novo assembled into 1,270,791 loci with an average of 28.8 reads per locus. After stringent filtering, 2440 high-quality SNPs were identified for downstream analysis. Analysis of genetic relationships and population structure, based on these retained SNPs, indicated the presence of three major groups. Gene ontology and pathway analysis were determined by using comparison SNP-associated flanking sequences with a public non-redundant database. Due to the lack of genomic resources in this genus, our present study is the first report to provide high-quality SNPs in Pimpinella based on a de novo analysis pipeline using ddRAD-seq. This data will enhance the molecular knowledge of the genus Pimpinella and will provide an important source of information for breeders and the research community to enhance breeding programs and support the management of Pimpinella genomic resources.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 1074
Author(s):  
Joanna Grzegorczyk ◽  
Artur Gurgul ◽  
Maria Oczkowicz ◽  
Tomasz Szmatoła ◽  
Agnieszka Fornal ◽  
...  

Poland is the largest European producer of goose, while goose breeding has become an essential and still increasing branch of the poultry industry. The most frequently bred goose is the White Kołuda® breed, constituting 95% of the country’s population, whereas geese of regional varieties are bred in smaller, conservation flocks. However, a goose’s genetic diversity is inaccurately explored, mainly because the advantages of the most commonly used tools are strongly limited in non-model organisms. One of the most accurate used markers for population genetics is single nucleotide polymorphisms (SNP). A highly efficient strategy for genome-wide SNP detection is genotyping-by-sequencing (GBS), which has been already widely applied in many organisms. This study attempts to use GBS in 12 conservative goose breeds and the White Kołuda® breed maintained in Poland. The GBS method allowed for the detection of 3833 common raw SNPs. Nevertheless, after filtering for read depth and alleles characters, we obtained the final markers panel used for a differentiation analysis that comprised 791 SNPs. These variants were located within 11 different genes, and one of the most diversified variants was associated with the EDAR gene, which is especially interesting as it participates in the plumage development, which plays a crucial role in goose breeding.


1993 ◽  
Vol 13 (9) ◽  
pp. 5266-5275
Author(s):  
R D Palmiter ◽  
E P Sandgren ◽  
D M Koeller ◽  
R L Brinster

DNA regions of 10 and 7 kb that flank the mouse metallothionein II (MT-II) and MT-I genes, respectively, were combined with a minimally marked MT-I (MT-I*) gene and tested in transgenic mice. This construct resulted in (i) position-independent expression of MT-I* mRNA and copy number-dependent expression, (ii) levels of hepatic MT-I mRNA per cell per transgene that were about half that derived from endogenous MT-I genes, (iii) appropriate regulation by metals and hormones, and (iv) tissue distribution of transgene mRNA that resembled that of endogenous MT-I mRNA. These features were not observed when MT-I* was tested without the flanking regions. These MT-I flanking sequences also improved the expression of rat growth hormone reporter genes, with or without introns, that were under the control of the MT-I promoter. Moreover, they enhanced expression from two of four heterologous promoters/enhancers that were tested. Deletion analysis indicated that regions known to have DNase I-hypersensitive sites were necessary but not sufficient for high-level expression. These data suggest that the DNA regions flanking the mouse MT-I and MT-II genes have functions like the locus control regions described for other genes.


mSystems ◽  
2018 ◽  
Vol 3 (5) ◽  
Author(s):  
Sean Ting-Shyang Wei ◽  
Yu-Wei Wu ◽  
Tzong-Huei Lee ◽  
Yi-Shiang Huang ◽  
Cheng-Yu Yang ◽  
...  

ABSTRACTThe 2,3-secopathway, the pathway for anaerobic cholesterol degradation, has been established in the denitrifying betaproteobacteriumSterolibacterium denitrificans. However, knowledge of how microorganisms respond to cholesterol at the community level is elusive. Here, we applied mesocosm incubation and 16S rRNA sequencing to reveal that, in denitrifying sludge communities, three betaproteobacterial operational taxonomic units (OTUs) with low (94% to 95%) 16S rRNA sequence similarity toStl. denitrificansare cholesterol degraders and members of the rare biosphere. Metatranscriptomic and metabolite analyses show that these degraders adopt the 2,3-secopathway to sequentially catalyze the side chain and sterane of cholesterol and that two molybdoenzymes—steroid C25 dehydrogenase and 1-testosterone dehydrogenase/hydratase—are crucial for these bioprocesses, respectively. The metatranscriptome further suggests that these betaproteobacterial degraders display chemotaxis and motility toward cholesterol and that FadL-like transporters may be the key components for substrate uptake. Also, these betaproteobacteria are capable of transporting micronutrients and synthesizing cofactors essential for cellular metabolism and cholesterol degradation; however, the required cobalamin is possibly provided by cobalamin-de novo-synthesizing gamma-, delta-, and betaproteobacteria via the salvage pathway. Overall, our results indicate that the ability to degrade cholesterol in sludge communities is reserved for certain rare biosphere members and that C25 dehydrogenase can serve as a biomarker for sterol degradation in anoxic environments.IMPORTANCESteroids are ubiquitous and abundant natural compounds that display recalcitrance. Biodegradation via sludge communities in wastewater treatment plants is the primary removal process for steroids. To date, compared to studies for aerobic steroid degradation, the knowledge of anaerobic degradation of steroids has been based on only a few model organisms. Due to the increase of anthropogenic impacts, steroid inputs may affect microbial diversity and functioning in ecosystems. Here, we first investigated microbial functional responses to cholesterol, the most abundant steroid in sludge, at the community level. Our metagenomic and metatranscriptomic analyses revealed that the capacities for cholesterol approach, uptake, and degradation are unique traits of certain low-abundance betaproteobacteria, indicating the importance of the rare biosphere in bioremediation. Apparent expression of genes involved in cofactorde novosynthesis and salvage pathways suggests that these micronutrients play important roles for cholesterol degradation in sludge communities.


Sign in / Sign up

Export Citation Format

Share Document