Genomic Structure and Comparative Analysis of Nine Fugu Genes: Conservation of Synteny with Human Chromosome Xp22.2–p22.1

Bodo Brunner; Tilman Todt; Steffen Lenzner; Karen Stout; Ute Schulz; Hans-Hilger Ropers; Vera M. Kalscheuer

doi:10.1101/gr.9.5.437

Genomic Structure and Comparative Analysis of Nine Fugu Genes: Conservation of Synteny with Human Chromosome Xp22.2–p22.1

Genome Research ◽

10.1101/gr.9.5.437 ◽

1999 ◽

Vol 9 (5) ◽

pp. 437-448 ◽

Cited By ~ 3

Author(s):

Bodo Brunner ◽

Tilman Todt ◽

Steffen Lenzner ◽

Karen Stout ◽

Ute Schulz ◽

...

Keyword(s):

Human Genome ◽

Sequence Data ◽

Genomic Structure ◽

Evolutionary Conservation ◽

Fugu Rubripes ◽

Link Type ◽

Human Genes ◽

Number Of Genes ◽

Gene Structures ◽

Data Libraries

The pufferfish Fugu rubripes has a compact 400-Mb genome that is ∼7.5 times smaller than the human genome but contains a similar number of genes. Focusing on the distal short arm of the human X chromosome, we have studied the evolutionary conservation of gene orders in Fugu and man. Sequencing of 68 kb of Fugugenomic DNA identified nine genes in the following order: (SCML2)-STK9, XLRS1, PPEF-1, KELCH2, KELCH1, PHKA2, AP19, and U2AF1-RS2. Apart from an evolutionary inversion separatingAP19 and U2AF1-RS2 from PHKA2, gene orders are identical in Fugu and man, and all nine human homologs map to the Xp22 band. All Fugu genes were found to be smaller than their human counterparts, but gene structures were mostly identical. These data suggest that genomic sequencing in Fugu is a powerful and economical strategy to predict gene orders in the human genome and to elucidate the structure of human genes.[Sequence data for this article were deposited with the EMBL/GenBank data libraries under accession nos. AJ011381 and AF094327.]

Download Full-text

Definition of the Gene Content of the Human Genome: The Need for Deep Experimental Verification

Comparative and Functional Genomics ◽

10.1002/cfg.81 ◽

2001 ◽

Vol 2 (3) ◽

pp. 169-175 ◽

Cited By ~ 2

Author(s):

Andrew J. G. Simpson ◽

Sandro J. de Souza ◽

Anamaria A. Camargo ◽

Ricardo R. Brentani

Keyword(s):

Human Genome ◽

Gene Structure ◽

Experimental Verification ◽

Human Gene ◽

Sequence Data ◽

Gene Prediction ◽

Human Genes ◽

Number Of Genes ◽

Definition Of

Based on the analysis of the drafts of the human genome sequence, it is being speculated that our species may possess an unexpectedly low number of genes. The quality of the drafts, the impossibility of accurate gene prediction and the lack of sufficient transcript sequence data, however, render such speculations very premature. The complexity of human gene structure requires additional and extensive experimental verification of transcripts that may result in major revisions of these early estimates of the number of human genes.

Download Full-text

Genomic Characterization of Human DSPG3

Genome Research ◽

10.1101/gr.9.5.449 ◽

1999 ◽

Vol 9 (5) ◽

pp. 449-456

Author(s):

Michelle Deere ◽

Jose L. Dieguez ◽

Sung-Joo Kim Yoon ◽

David Hewett-Emmett ◽

Albert de la Chapelle ◽

...

Keyword(s):

Transcription Start Site ◽

Sequence Data ◽

Stop Codon ◽

Genomic Structure ◽

Start Codon ◽

Start Site ◽

Transcription Start ◽

Exon 2 ◽

Ancestral Gene ◽

Link Type

DSPG3, the human homolog to chick PG-Lb, is a member of the small leucine-rich repeat proteoglycan (SLRP) family, including decorin, biglycan, fibromodulin, and lumican. In contrast to the tissue distribution of the other SLRPs, DSPG3 is predominantly expressed in cartilage. In this study, we have determined that the human DSPG3 gene is composed of seven exons: Exon 2 ofDSPG3 includes the start codon, exons 4–7 code for the leucine-rich repeats, exons 3 and 7 contain the potential glycosaminoglycan attachment sites, and exon 7 contains the potential N-glycosylation sites and the stop codon. We have identified two polymorphic variations, an insertion/deletion composed of 19 nucleotides in intron 1 and a tetranucleotide (TATT)n repeat in intron 5. Analysis of 1.6 kb of upstream promoter sequence ofDSPG3 reveals three TATA boxes, one of which is 20 nucleotides before the transcription start site. The transcription start site precedes the translation start site by 98 nucleotides. There are 14 potential binding sites for SOX9, a transcription factor present in cartilage, in the promoter, and in the first intron of DSPG3. We have examined the evolution of the SLRP gene family and found that gene products clustered together in the evolutionary tree are encoded by genes with similarities in genomic structure. Hence, it appears that the majority of the introns in the SLRP genes were inserted after the differentiation of the SLRP genes from an ancestral gene that was most likely composed of 2–3 exons.[The sequence data described in this paper have been submitted to GenBank under accession nos.AF031658 and U63814.]

Download Full-text

Characterization of Nonfunctional V1R-like Pheromone Receptor Sequences in Human

Genome Research ◽

10.1101/gr.146700 ◽

2000 ◽

Vol 10 (12) ◽

pp. 1979-1985 ◽

Cited By ~ 1

Author(s):

Dominique Giorgi ◽

Cynthia Friedman ◽

Barbara J. Trask ◽

Sylvie Rouquier

Keyword(s):

Transmembrane Domain ◽

Sequence Data ◽

Sensory System ◽

Vomeronasal Organ ◽

Gene Families ◽

Somatic Cell Hybrid Panel ◽

Terrestrial Vertebrates ◽

Link Type ◽

Characteristic Features ◽

Data Libraries

The vomeronasal organ (VNO) or Jacobson's organ is responsible in terrestrial vertebrates for the sensory perception of pheromones, chemicals that elicit stereotyped behaviors among individuals of the same species. Pheromone-induced behaviors and a functional VNO have been described in a number of mammals, but the existence of this sensory system in human is still debated. Recently, two nonhomologous gene families, V1R and V2R, encoding pheromone receptors have been identified in rat. These receptors belong to the seven-transmembrane domain G-protein-coupled receptor superfamily. We sought to characterize V1R-like genes in the human genome. We have identified seven different human sequences by PCR and library screening with rodent sequences. These human sequences exhibit characteristic features of V1R receptors and show 52%–59% of amino acid sequence identity with the rat sequences. Using PCR on a monochromosomal somatic cell hybrid panel and/or FISH, we demonstrate that these V1R-like sequences are distributed on chromosomes 7, 16, 20, 13, 14, 15, 21, and 22 and possibly on additional chromosomes. One sequence hybridizes to pericentromeric locations on all the acrocentric chromosomes (13, 14, 15, 21, and 22). All of the seven V1R-like sequences analyzed show interrupted reading frames, indicating that they represent nonfunctional pseudogenes. The preponderence of pseudogenes among human V1R sequences and the striking anatomical differences between rodent and human VNO raise the possibility that humans may have lost the V1R/VNO-mediated sensory functions of rodents.[Sequence data from this article have been deposited with the DDBJ/EMBL/GenBank Data Libraries under accession nos. U73852–73853 andAF253312–253316.]

Download Full-text

Finding New Human Minisatellite Sequences in the Vicinity of Long CA-Rich Sequences

Genome Research ◽

10.1101/gr.9.7.647 ◽

1999 ◽

Vol 9 (7) ◽

pp. 647-653 ◽

Cited By ~ 1

Author(s):

Fabienne Giraudeau ◽

Elisabeth Petit ◽

Hervé Avet-Loiseau ◽

Yolande Hauck ◽

Gilles Vergnaud ◽

...

Keyword(s):

Human Genome ◽

Sequence Data ◽

Chromosome 1 ◽

Chromosomal Distribution ◽

Repeat Sequences ◽

Link Type ◽

Tandem Repeat Sequences ◽

Sequences Analysis ◽

Data Library ◽

Chromosomal Bands

Microsatellites and minisatellites are two classes of tandem repeat sequences differing in their size, mutation processes, and chromosomal distribution. The boundary between the two classes is not defined. We have developed a convenient, hybridization-based human library screening procedure able to detect long CA-rich sequences. Analysis of cosmid clones derived from a chromosome 1 library show that cross-hybridizing sequences tested are imperfect CA-rich sequences, some of them showing a minisatellite organization. All but one of the 13 positive chromosome 1 clones studied are localized in chromosomal bands to which minisatellites have previously been assigned, such as the 1pter cluster. To test the applicability of the procedure to minisatellite detection on a larger scale, we then used a large-insert whole-genome PAC library. Altogether, 22 new minisatellites have been identified in positive PAC and cosmid clones and 20 of them are telomeric. Among the 42 positive PAC clones localized within the human genome by FISH and/or linkage analysis, 25 (60%) are assigned to a terminal band of the karyotype, 4 (9%) are juxtacentromeric, and 13 (31%) are interstitial. The localization of at least two of the interstitial PAC clones corresponds to previously characterized minisatellite-containing regions and/or ancestrally telomeric bands, in agreement with this minisatellite-like distribution. The data obtained are in close agreement with the parallel investigation of human genome sequence data and suggest that long human (CA)s are imperfect CA repeats belonging to the minisatellite class of sequences. This approach provides a new tool to efficiently target genomic clones originating from subtelomeric domains, from which minisatellite sequences can readily be obtained.[The sequence data described in this paper have been submitted to the EMBL data library under accession nos.AJ000377–AJ000383.]

Download Full-text

Mapping Z-DNA in the human genome. Computer-aided mapping reveals a nonrandom distribution of potential Z-DNA-forming sequences in human genes.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(19)49776-7 ◽

1992 ◽

Vol 267 (17) ◽

pp. 11846-11855

Author(s):

G.P. Schroth ◽

P.J. Chou ◽

P.S. Ho

Keyword(s):

Human Genome ◽

Human Genes ◽

Computer Aided ◽

Nonrandom Distribution ◽

Z Dna

Download Full-text

Identification and Characterization of Novel Human Endogenous Retrovirus Families by Phylogenetic Screening of the Human Genome Mapping Project Database

Journal of Virology ◽

10.1128/jvi.74.8.3715-3730.2000 ◽

2000 ◽

Vol 74 (8) ◽

pp. 3715-3730 ◽

Cited By ~ 202

Author(s):

Michael Tristem

Keyword(s):

Human Genome ◽

Genome Mapping ◽

Sequence Data ◽

Endogenous Retrovirus ◽

Endogenous Retroviruses ◽

Human Endogenous Retrovirus ◽

Sequence Information ◽

Class Iii ◽

Genome Mapping Project ◽

Human Genome Mapping Project

ABSTRACT Human endogenous retroviruses (HERVs) were first identified almost 20 years ago, and since then numerous families have been described. It has, however, been difficult to obtain a good estimate of both the total number of independently derived families and their relationship to each other as well as to other members of the familyRetroviridae. In this study, I used sequence data derived from over 150 novel HERVs, obtained from the Human Genome Mapping Project database, and a variety of recently identified nonhuman retroviruses to classify the HERVs into 22 independently acquired families. Of these, 17 families were loosely assigned to the class I HERVs, 3 to the class II HERVs and 2 to the class III HERVs. Many of these families have been identified previously, but six are described here for the first time and another four, for which only partial sequence information was previously available, were further characterized. Members of each of the 10 families are defective, and calculation of their integration dates suggested that most of them are likely to have been present within the human lineage since it diverged from the Old World monkeys more than 25 million years ago.

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text

Evolution of the cystatin B gene: implications for the origin of its variable dodecamer tandem repeat in humans☆☆Sequence data from this article have been deposited with the DDBJ/EMBL/GenBank Data Libraries under Accession Nos. AB083085 to AB083089, AB083416, and AB083417.

Genomics ◽

10.1016/s0888-7543(02)00010-1 ◽

2003 ◽

Vol 81 (1) ◽

pp. 78-84 ◽

Cited By ~ 6

Author(s):

Motoki Osawa ◽

Mika Kaneko ◽

Hidekazu Horiuchi ◽

Takashi Kitano ◽

Yoshi Kawamoto ◽

...

Keyword(s):

Tandem Repeat ◽

Sequence Data ◽

Cystatin B ◽

Data Libraries

Download Full-text

Machine learning can differentiate venom toxins from other proteins having non-toxic physiological functions

PeerJ Computer Science ◽

10.7717/peerj-cs.90 ◽

2016 ◽

Vol 2 ◽

pp. e90 ◽

Cited By ~ 24

Author(s):

Ranko Gacesa ◽

David J. Barlow ◽

Paul F. Long

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Biological Data ◽

Biological Databases ◽

Web Based ◽

Physiological Functions ◽

Link Type ◽

Venom Toxins ◽

Venomous Animals ◽

Toxin Protein

Ascribing function to sequence in the absence of biological data is an ongoing challenge in bioinformatics. Differentiating the toxins of venomous animals from homologues having other physiological functions is particularly problematic as there are no universally accepted methods by which to attribute toxin function using sequence data alone. Bioinformatics tools that do exist are difficult to implement for researchers with little bioinformatics training. Here we announce a machine learning tool called ‘ToxClassifier’ that enables simple and consistent discrimination of toxins from non-toxin sequences with >99% accuracy and compare it to commonly used toxin annotation methods. ‘ToxClassifer’ also reports the best-hit annotation allowing placement of a toxin into the most appropriate toxin protein family, or relates it to a non-toxic protein having the closest homology, giving enhanced curation of existing biological databases and new venomics projects. ‘ToxClassifier’ is available for free, either to download (https://github.com/rgacesa/ToxClassifier) or to use on a web-based server (http://bioserv7.bioinfo.pbf.hr/ToxClassifier/).

Download Full-text

Gene Ontology Meta Annotator for Plants

10.1101/809988 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kokulapalan Wimalanathan ◽

Carolyn J. Lawrence-Dill

Keyword(s):

Gene Ontology ◽

Go Annotation ◽

Plant Genomes ◽

Link Type ◽

Gene Structures ◽

And Performance ◽

Genome Assemblies ◽

Genome Scale ◽

Per Gene

AbstractAnnotating gene structures and functions to genome assemblies is a must to make assembly resources useful for biological inference. Gene Ontology (GO) term assignment is the most pervasively used functional annotation system, and new methods for GO assignment have improved the quality of GO-based function predictions. GOMAP, the Gene Ontology Meta Annotator for Plants (GOMAP) is an optimized, high-throughput, and reproducible pipeline for genome-scale GO annotation for plant genomes. GOMAP’s methods have been shown to expand and improve the number of genes annotated and annotations assigned per gene as well as the quality (based on F-score) of GO assignments in maize. Here we report on the pipeline’s availability and performance for annotating large, repetitive plant genomes and describe how to deploy GOMAP to annotate additional plant genomes. We containerized GOMAP to increase portability and reproducibility, and optimized its performance for HPC environments. GOMAP has been used to annotate multiple maize lines, and is currently being deployed to annotate other species including wheat, rice, barley, cotton, soy, and others. Instructions along with access to the GOMAP Singularity container are freely available online at https://gomap-singularity.readthedocs.io/en/latest/. A list of annotated genomes and links to data is maintained at https://dill-picl.org/projects/gomap/gomap-datasets/.

Download Full-text