Assexon: Assembling Exon Using Gene Capture Data

Structure and variability of human chromosome ends

Molecular and Cellular Biology ◽

10.1128/mcb.10.2.518-527.1990 ◽

1990 ◽

Vol 10 (2) ◽

pp. 518-527

Author(s):

T de Lange ◽

L Shiue ◽

R M Myers ◽

D R Cox ◽

S L Naylor ◽

...

Keyword(s):

De Novo ◽

Hybrid Cell ◽

Somatic Cells ◽

Terminal Segment ◽

Human Chromosomes ◽

Primary Tumors ◽

Length Polymorphisms ◽

Subtelomeric Repeat ◽

Flanking Sequences ◽

Flanking Regions

Mammalian telomeres are thought to be composed of a tandem array of TTAGGG repeats. To further define the type and arrangement of sequences at the ends of human chromosomes, we developed a direct cloning strategy for telomere-associated DNA. The method involves a telomere enrichment procedure based on the relative lack of restriction endonuclease cutting sites near the ends of human chromosomes. Nineteen (TTAGGG)n-bearing plasmids were isolated, two of which contain additional human sequences proximal to the telomeric repeats. These telomere-flanking sequences detect BAL 31-sensitive loci and thus are located close to chromosome ends. One of the flanking regions is part of a subtelomeric repeat that is present at 10 to 25% of the chromosome ends in the human genome. This sequence is not conserved in rodent DNA and therefore should be a helpful tool for physical characterization of human chromosomes in human-rodent hybrid cell lines; some of the chromosomes that may be analyzed in this manner have been identified, i.e., 7, 16, 17, and 21. The minimal size of the subtelomeric repeat is 4 kilobases (kb); it shows a high frequency of restriction fragment length polymorphisms and undergoes extensive de novo methylation in somatic cells. Distal to the subtelomeric repeat, the chromosomes terminate in a long region (up to 14 kb) that may be entirely composed of TTAGGG repeats. This terminal segment is unusually variable. Although sperm telomeres are 10 to 14 kb long, telomeres in somatic cells are several kilobase pairs shorter and very heterogeneous in length. Additional telomere reduction occurs in primary tumors, indicating that somatic telomeres are unstable and may continuously lose sequences from their termini.

Download Full-text

Structure and variability of human chromosome ends.

Molecular and Cellular Biology ◽

10.1128/mcb.10.2.518 ◽

1990 ◽

Vol 10 (2) ◽

pp. 518-527 ◽

Cited By ~ 532

Author(s):

T de Lange ◽

L Shiue ◽

R M Myers ◽

D R Cox ◽

S L Naylor ◽

...

Keyword(s):

De Novo ◽

Hybrid Cell ◽

Somatic Cells ◽

Terminal Segment ◽

Human Chromosomes ◽

Primary Tumors ◽

Length Polymorphisms ◽

Subtelomeric Repeat ◽

Flanking Sequences ◽

Flanking Regions

Mammalian telomeres are thought to be composed of a tandem array of TTAGGG repeats. To further define the type and arrangement of sequences at the ends of human chromosomes, we developed a direct cloning strategy for telomere-associated DNA. The method involves a telomere enrichment procedure based on the relative lack of restriction endonuclease cutting sites near the ends of human chromosomes. Nineteen (TTAGGG)n-bearing plasmids were isolated, two of which contain additional human sequences proximal to the telomeric repeats. These telomere-flanking sequences detect BAL 31-sensitive loci and thus are located close to chromosome ends. One of the flanking regions is part of a subtelomeric repeat that is present at 10 to 25% of the chromosome ends in the human genome. This sequence is not conserved in rodent DNA and therefore should be a helpful tool for physical characterization of human chromosomes in human-rodent hybrid cell lines; some of the chromosomes that may be analyzed in this manner have been identified, i.e., 7, 16, 17, and 21. The minimal size of the subtelomeric repeat is 4 kilobases (kb); it shows a high frequency of restriction fragment length polymorphisms and undergoes extensive de novo methylation in somatic cells. Distal to the subtelomeric repeat, the chromosomes terminate in a long region (up to 14 kb) that may be entirely composed of TTAGGG repeats. This terminal segment is unusually variable. Although sperm telomeres are 10 to 14 kb long, telomeres in somatic cells are several kilobase pairs shorter and very heterogeneous in length. Additional telomere reduction occurs in primary tumors, indicating that somatic telomeres are unstable and may continuously lose sequences from their termini.

Download Full-text

De novo assembly and annotation of a highly contiguous reference genome of the fathead minnow (Pimephales promelas) reveals an AT-rich repetitive genome with compact gene structure

10.1101/2021.02.24.432777 ◽

2021 ◽

Cited By ~ 1

Author(s):

John Martinson ◽

David C. Bencic ◽

Gregory P. Toth ◽

Mitchell S. Kostich ◽

Robert W. Flick ◽

...

Keyword(s):

Gene Structure ◽

Fathead Minnow ◽

Reference Genome ◽

De Novo ◽

Gene Annotation ◽

Pimephales Promelas ◽

Single Copy ◽

Model Organisms ◽

Coding Regions ◽

Genomic Resource

ABSTRACTThe Fathead Minnow (FHM) is one of the most important and widely used model organisms in aquatic toxicology. The lack of a high-quality and well-annotated FHM reference genome, however, has severely hampered the efforts using modem ‘omics approaches with FHM for environmental toxicogenomics studies. We present here a de novo assembled and nearly complete reference of the fathead minnow genome. Compared to the current fragmented and sparsely annotated FHM genome assembly (FHM1), the new highly contiguous and well-annotated FHM reference genome (FHM2) represents a major improvement, having 95.1% of the complete BUSCOs (Benchmarking Universal Single-Copy Orthologs) and a scaffold N50 of 12.0 Mbps. The completeness of gene annotation for the FHM2 reference genome was demonstrated to be comparable to that of the zebrafish (ZF) GRCz11 reference genome. In addition, our comparative genomics analyses between FHM and ZF revealed highly conserved coding regions between two species while discovering much more compact gene structure in FHM than ZF. This study not only provides insights for assembling a highly repetitive AT-rich genome, but also delivers a critical genomic resource essential for toxicogenomics studies in environmental toxicology.

Download Full-text

Removing the bad apples: A simple bioinformatic method to improve loci‐recovery in de novo RADseq data for non‐model organisms

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13562 ◽

2021 ◽

Cited By ~ 1

Author(s):

José Cerca ◽

Marius F. Maurstad ◽

Nicolas C. Rochette ◽

Angel G. Rivera‐Colón ◽

Niraj Rayamajhi ◽

...

Keyword(s):

De Novo ◽

Model Organisms

Download Full-text

In Search of Species-Specific SNPs in a Non-Model Animal (European Bison (Bison bonasus))—Comparison of De Novo and Reference-Based Integrated Pipeline of STACKS Using Genotyping-by-Sequencing (GBS) Data

Animals ◽

10.3390/ani11082226 ◽

2021 ◽

Vol 11 (8) ◽

pp. 2226

Author(s):

Sazia Kunvar ◽

Sylwia Czarnomska ◽

Cino Pertoldi ◽

Małgorzata Tokarska

Keyword(s):

Reference Genome ◽

De Novo ◽

Bos Taurus ◽

Model Organism ◽

Genotyping By Sequencing ◽

Model Organisms ◽

European Bison ◽

Model Animal ◽

Pcr Duplicates ◽

Species Specific

The European bison is a non-model organism; thus, most of its genetic and genomic analyses have been performed using cattle-specific resources, such as BovineSNP50 BeadChip or Illumina Bovine 800 K HD Bead Chip. The problem with non-specific tools is the potential loss of evolutionary diversified information (ascertainment bias) and species-specific markers. Here, we have used a genotyping-by-sequencing (GBS) approach for genotyping 256 samples from the European bison population in Bialowieza Forest (Poland) and performed an analysis using two integrated pipelines of the STACKS software: one is de novo (without reference genome) and the other is a reference pipeline (with reference genome). Moreover, we used a reference pipeline with two different genomes, i.e., Bos taurus and European bison. Genotyping by sequencing (GBS) is a useful tool for SNP genotyping in non-model organisms due to its cost effectiveness. Our results support GBS with a reference pipeline without PCR duplicates as a powerful approach for studying the population structure and genotyping data of non-model organisms. We found more polymorphic markers in the reference pipeline in comparison to the de novo pipeline. The decreased number of SNPs from the de novo pipeline could be due to the extremely low level of heterozygosity in European bison. It has been confirmed that all the de novo/Bos taurus and Bos taurus reference pipeline obtained SNPs were unique and not included in 800 K BovineHD BeadChip.

Download Full-text

The brain transcriptome of the wolf spider, Schizocosa ocreata

BMC Research Notes ◽

10.1186/s13104-021-05648-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Daniel Stribling ◽

Peter L. Chang ◽

Justin E. Dalton ◽

Christopher A. Conow ◽

Malcolm Rosenthal ◽

...

Keyword(s):

Gene Expression ◽

De Novo ◽

Transcriptome Assembly ◽

Model Organisms ◽

De Novo Transcriptome Assembly ◽

De Novo Transcriptome ◽

Wolf Spiders ◽

Schizocosa Ocreata ◽

Genomic Studies ◽

The Brain

Abstract Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.

Download Full-text

De Novo SNP Discovery and Genotyping of Iranian Pimpinella Species Using ddRAD Sequencing

Agronomy ◽

10.3390/agronomy11071342 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1342

Author(s):

Shaghayegh Mehravi ◽

Gholam Ali Ranjbar ◽

Ghader Mirzaghaderi ◽

Anita Alice Severn-Ellis ◽

Armin Scheben ◽

...

Keyword(s):

De Novo ◽

Genetic Relationships ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Genomic Resources ◽

High Quality Snps ◽

The Family ◽

Double Digestion ◽

Flanking Sequences ◽

Downstream Analysis

The species of Pimpinella, one of the largest genera of the family Apiaceae, are traditionally cultivated for medicinal purposes. In this study, high-throughput double digest restriction-site associated DNA sequencing technology (ddRAD-seq) was used to identify single nucleotide polymorphisms (SNPs) in eight Pimpinella species from Iran. After double-digestion with the enzymes HpyCH4IV and HinfI, a total of 334,702,966 paired-end reads were de novo assembled into 1,270,791 loci with an average of 28.8 reads per locus. After stringent filtering, 2440 high-quality SNPs were identified for downstream analysis. Analysis of genetic relationships and population structure, based on these retained SNPs, indicated the presence of three major groups. Gene ontology and pathway analysis were determined by using comparison SNP-associated flanking sequences with a public non-redundant database. Due to the lack of genomic resources in this genus, our present study is the first report to provide high-quality SNPs in Pimpinella based on a de novo analysis pipeline using ddRAD-seq. This data will enhance the molecular knowledge of the genus Pimpinella and will provide an important source of information for breeders and the research community to enhance breeding programs and support the management of Pimpinella genomic resources.

Download Full-text

Single Nucleotide Polymorphism Discovery and Genetic Differentiation Analysis of Geese Bred in Poland, Using Genotyping-by-Sequencing (GBS)

Genes ◽

10.3390/genes12071074 ◽

2021 ◽

Vol 12 (7) ◽

pp. 1074

Author(s):

Joanna Grzegorczyk ◽

Artur Gurgul ◽

Maria Oczkowicz ◽

Tomasz Szmatoła ◽

Agnieszka Fornal ◽

...

Keyword(s):

Genotyping By Sequencing ◽

Read Depth ◽

Model Organisms ◽

Single Nucleotide Polymorphism Discovery ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Polymorphism Discovery ◽

Genome Wide ◽

Plumage Development ◽

Edar Gene

Poland is the largest European producer of goose, while goose breeding has become an essential and still increasing branch of the poultry industry. The most frequently bred goose is the White Kołuda® breed, constituting 95% of the country’s population, whereas geese of regional varieties are bred in smaller, conservation flocks. However, a goose’s genetic diversity is inaccurately explored, mainly because the advantages of the most commonly used tools are strongly limited in non-model organisms. One of the most accurate used markers for population genetics is single nucleotide polymorphisms (SNP). A highly efficient strategy for genome-wide SNP detection is genotyping-by-sequencing (GBS), which has been already widely applied in many organisms. This study attempts to use GBS in 12 conservative goose breeds and the White Kołuda® breed maintained in Poland. The GBS method allowed for the detection of 3833 common raw SNPs. Nevertheless, after filtering for read depth and alleles characters, we obtained the final markers panel used for a differentiation analysis that comprised 791 SNPs. These variants were located within 11 different genes, and one of the most diversified variants was associated with the EDAR gene, which is especially interesting as it participates in the plumage development, which plays a crucial role in goose breeding.

Download Full-text

Distal regulatory elements from the mouse metallothionein locus stimulate gene expression in transgenic mice

Molecular and Cellular Biology ◽

10.1128/mcb.13.9.5266-5275.1993 ◽

1993 ◽

Vol 13 (9) ◽

pp. 5266-5275

Author(s):

R D Palmiter ◽

E P Sandgren ◽

D M Koeller ◽

R L Brinster

Keyword(s):

Transgenic Mice ◽

Regulatory Elements ◽

High Level Expression ◽

Hypersensitive Sites ◽

Enhanced Expression ◽

Rat Growth ◽

Flanking Sequences ◽

High Level ◽

Distal Regulatory Elements ◽

Flanking Regions

DNA regions of 10 and 7 kb that flank the mouse metallothionein II (MT-II) and MT-I genes, respectively, were combined with a minimally marked MT-I (MT-I*) gene and tested in transgenic mice. This construct resulted in (i) position-independent expression of MT-I* mRNA and copy number-dependent expression, (ii) levels of hepatic MT-I mRNA per cell per transgene that were about half that derived from endogenous MT-I genes, (iii) appropriate regulation by metals and hormones, and (iv) tissue distribution of transgene mRNA that resembled that of endogenous MT-I mRNA. These features were not observed when MT-I* was tested without the flanking regions. These MT-I flanking sequences also improved the expression of rat growth hormone reporter genes, with or without introns, that were under the control of the MT-I promoter. Moreover, they enhanced expression from two of four heterologous promoters/enhancers that were tested. Deletion analysis indicated that regions known to have DNase I-hypersensitive sites were necessary but not sufficient for high-level expression. These data suggest that the DNA regions flanking the mouse MT-I and MT-II genes have functions like the locus control regions described for other genes.

Download Full-text

Microbial Functional Responses to Cholesterol Catabolism in Denitrifying Sludge

mSystems ◽

10.1128/msystems.00113-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 6

Author(s):

Sean Ting-Shyang Wei ◽

Yu-Wei Wu ◽

Tzong-Huei Lee ◽

Yi-Shiang Huang ◽

Cheng-Yu Yang ◽

...

Keyword(s):

16S Rrna ◽

De Novo ◽

Anthropogenic Impacts ◽

Sequence Similarity ◽

Community Level ◽

Model Organisms ◽

Substrate Uptake ◽

Functional Responses ◽

Content Type ◽

Rare Biosphere

ABSTRACTThe 2,3-secopathway, the pathway for anaerobic cholesterol degradation, has been established in the denitrifying betaproteobacteriumSterolibacterium denitrificans. However, knowledge of how microorganisms respond to cholesterol at the community level is elusive. Here, we applied mesocosm incubation and 16S rRNA sequencing to reveal that, in denitrifying sludge communities, three betaproteobacterial operational taxonomic units (OTUs) with low (94% to 95%) 16S rRNA sequence similarity toStl. denitrificansare cholesterol degraders and members of the rare biosphere. Metatranscriptomic and metabolite analyses show that these degraders adopt the 2,3-secopathway to sequentially catalyze the side chain and sterane of cholesterol and that two molybdoenzymes—steroid C25 dehydrogenase and 1-testosterone dehydrogenase/hydratase—are crucial for these bioprocesses, respectively. The metatranscriptome further suggests that these betaproteobacterial degraders display chemotaxis and motility toward cholesterol and that FadL-like transporters may be the key components for substrate uptake. Also, these betaproteobacteria are capable of transporting micronutrients and synthesizing cofactors essential for cellular metabolism and cholesterol degradation; however, the required cobalamin is possibly provided by cobalamin-de novo-synthesizing gamma-, delta-, and betaproteobacteria via the salvage pathway. Overall, our results indicate that the ability to degrade cholesterol in sludge communities is reserved for certain rare biosphere members and that C25 dehydrogenase can serve as a biomarker for sterol degradation in anoxic environments.IMPORTANCESteroids are ubiquitous and abundant natural compounds that display recalcitrance. Biodegradation via sludge communities in wastewater treatment plants is the primary removal process for steroids. To date, compared to studies for aerobic steroid degradation, the knowledge of anaerobic degradation of steroids has been based on only a few model organisms. Due to the increase of anthropogenic impacts, steroid inputs may affect microbial diversity and functioning in ecosystems. Here, we first investigated microbial functional responses to cholesterol, the most abundant steroid in sludge, at the community level. Our metagenomic and metatranscriptomic analyses revealed that the capacities for cholesterol approach, uptake, and degradation are unique traits of certain low-abundance betaproteobacteria, indicating the importance of the rare biosphere in bioremediation. Apparent expression of genes involved in cofactorde novosynthesis and salvage pathways suggests that these micronutrients play important roles for cholesterol degradation in sludge communities.

Download Full-text