Gene Discovery Using Computational and Microarray Analysis of Transcription in the Drosophila melanogaster Testis

Justen Andrews; Gerard G. Bouffard; Chris Cheadle; Jining Lü; Kevin G. Becker; Brian Oliver

doi:10.1101/gr.159800

Gene Discovery Using Computational and Microarray Analysis of Transcription in the Drosophila melanogaster Testis

Genome Research ◽

10.1101/gr.159800 ◽

2000 ◽

Vol 10 (12) ◽

pp. 2030-2043

Author(s):

Justen Andrews ◽

Gerard G. Bouffard ◽

Chris Cheadle ◽

Jining Lü ◽

Kevin G. Becker ◽

...

Keyword(s):

Microarray Analysis ◽

Genomic Sequence ◽

Sequence Data ◽

Expression Profiles ◽

Transcript Abundance ◽

Cdna Libraries ◽

Link Type ◽

Microarray Expression ◽

Expressed Sequence ◽

Data Library

Identification and annotation of all the genes in the sequencedDrosophila genome is a work in progress. Wild-type testis function requires many genes and is thus of potentially high value for the identification of transcription units. We therefore undertook a survey of the repertoire of genes expressed in the Drosophilatestis by computational and microarray analysis. We generated 3141 high-quality testis expressed sequence tags (ESTs). Testis ESTs computationally collapsed into 1560 cDNA set used for further analysis. Of those, 11% correspond to named genes, and 33% provide biological evidence for a predicted gene. A surprising 47% fail to align with existing ESTs and 16% with predicted genes in the current genome release. EST frequency and microarray expression profiles indicate that the testis mRNA population is highly complex and shows an extended range of transcript abundance. Furthermore, >80% of the genes expressed in the testis showed onefold overexpression relative to ovaries, or gonadectomized flies. Additionally, >3% showed more than threefold overexpression at p <0.05. Surprisingly, 22% of the genes most highly overexpressed in testis matchDrosophila genomic sequence, but not predicted genes. These data strongly support the idea that sequencing additional cDNA libraries from defined tissues, such as testis, will be important tools for refined annotation of the Drosophila genome. Additionally, these data suggest that the number of genes in Drosophila will significantly exceed the conservative estimate of 13,601.[The sequence data described in this paper have been submitted to the dbEST data library under accession nos.AI944400–AI947263 and BE661985–BE662262.][The microarray data described in this paper have been submitted to the GEO data library under accession nos. GPLS, GSM3–GSM10.]

Download Full-text

The Nuclear Receptor Superfamily Has Undergone Extensive Proliferation and Diversification in Nematodes

Genome Research ◽

10.1101/gr.9.2.103 ◽

1999 ◽

Vol 9 (2) ◽

pp. 103-120 ◽

Cited By ~ 6

Author(s):

Ann E. Sluder ◽

Siuyien Wong Mathews ◽

David Hough ◽

Viravuth P. Yin ◽

Claude V. Maina

Keyword(s):

Nuclear Receptor ◽

Genomic Sequence ◽

Sequence Data ◽

Single Species ◽

Transcriptional Regulators ◽

Caenorhabditis Briggsae ◽

C Elegans ◽

Link Type ◽

Data Library ◽

Existing Data

The nuclear receptor (NR) superfamily is the most abundant class of transcriptional regulators encoded in the Caenorhabditis elegans genome, with >200 predicted genes revealed by the screens and analysis of genomic sequence reported here. This is the largest number of NR genes yet described from a single species, although our analysis of available genomic sequence from the related nematode Caenorhabditis briggsae indicates that it also has a large number. Existing data demonstrate expression for 25% of theC. elegans NR sequences. Sequence conservation and statistical arguments suggest that the majority represent functional genes. An analysis of these genes based on the DNA-binding domain motif revealed that several NR classes conserved in both vertebrates and insects are also represented among the nematode genes, consistent with the existence of ancient NR classes shared among most, and perhaps all, metazoans. Most of the nematode NR sequences, however, are distinct from those currently known in other phyla, and reveal a previously unobserved diversity within the NR superfamily. In C. elegans, extensive proliferation and diversification of NR sequences have occurred on chromosome V, accounting for > 50% of the predicted NR genes.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos.AF083222–AF083225 and AF083251–AF083234.]

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text

A compilation of soybean ESTs: generation and analysis

Genome ◽

10.1139/g01-150 ◽

2002 ◽

Vol 45 (2) ◽

pp. 329-338 ◽

Cited By ~ 92

Author(s):

Randy Shoemaker ◽

Paul Keim ◽

Lila Vodkin ◽

Ernest Retzel ◽

Sandra W Clifton ◽

...

Keyword(s):

Genome Sequencing ◽

Expression Profiles ◽

Expression Patterns ◽

Cdna Libraries ◽

Sequence Length ◽

Valuable Insight ◽

Alternative Approach ◽

Average Sequence Length ◽

Expressed Sequence ◽

Insight Into

Whole-genome sequencing is fundamental to understanding the genetic composition of an organism. Given the size and complexity of the soybean genome, an alternative approach is targeted random-gene sequencing, which provides an immediate and productive method of gene discovery. In this study, more than 120 000 soybean expressed sequence tags (ESTs) generated from more than 50 cDNA libraries were evaluated. These ESTs coalesced into 16 928 contigs and 17 336 singletons. On average, each contig was composed of 6 ESTs and spanned 788 bases. The average sequence length submitted to dbEST was 414 bases. Using only those libraries generating more than 800 ESTs each and only those contigs with 10 or more ESTs each, correlated patterns of gene expression among libraries and genes were discerned. Two-dimensional qualitative representations of contig and library similarities were generated based on expression profiles. Genes with similar expression patterns and, potentially, similar functions were identified. These studies provide a rich source of publicly available gene sequences as well as valuable insight into the structure, function, and evolution of a model crop legume genome.Key words: Glycine max, genome sequencing, functional genomics.

Download Full-text

Isolation of Zebrafish gdf7 and Comparative Genetic Mapping of Genes Belonging to the Growth/Differentiation Factor 5, 6, 7 Subgroup of the TGF-β Superfamily

Genome Research ◽

10.1101/gr.9.2.121 ◽

1999 ◽

Vol 9 (2) ◽

pp. 121-129

Author(s):

Alan J. Davidson ◽

John H. Postlethwait ◽

Yi-Lin Yan ◽

David R. Beier ◽

Cherie van Doren ◽

...

Keyword(s):

Linkage Group ◽

Sequence Data ◽

Evolutionary Relationships ◽

Mapping Data ◽

Link Type ◽

Differentiation Factor ◽

Mammalian Genes ◽

Comparative Genetic Mapping ◽

Growth Differentiation Factor 5 ◽

Data Library

The Growth/differentiation factor (Gdf)5, 6, 7 genes form a closely related subgroup belonging to the TGF-β superfamily. In zebrafish, there are three genes that belong to the Gdf5, 6, 7subgroup that have been named radar, dynamo, andcontact. The genes radar and dynamo both encode proteins most similar to mouse GDF6. The orthologous identity of these genes on the basis of amino acid similarities has not been clear. We have identified gdf7, a fourth zebrafish gene belonging to the Gdf5, 6, 7 subgroup. To assign correct orthologies and to investigate the evolutionary relationships of the human, mouse, and zebrafish Gdf5, 6, 7subgroup, we have compared genetic map positions of the zebrafish and mammalian genes. We have mapped zebrafish gdf7 to linkage group (LG) 17, contact to LG9, GDF6 to human chromosome (Hsa) 8 and GDF7 to Hsa2p. The radar anddynamo genes have been localized previously to LG16 and LG19, respectively. A comparison of syntenies shared among human, mouse, and zebrafish genomes indicates that gdf7 is the ortholog of mammalian GDF7/Gdf7. LG16 shares syntenic relationships with mouse chromosome (Mmu) 4, including Gdf6. Portions of LG16 and LG19 appear to be duplicate chromosomes, thus suggesting thatradar and dynamo are both orthologs of Gdf6. Finally, the mapping data is consistent with contact being the zebrafish ortholog of mammalian GDF5/Gdf5.[The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AF113022 and AF113023.]

Download Full-text

Comparative Sequence of Human and Mouse BAC Clones from the mnd2 Region of Chromosome 2p13

Genome Research ◽

10.1101/gr.9.1.53 ◽

1999 ◽

Vol 9 (1) ◽

pp. 53-61 ◽

Cited By ~ 9

Author(s):

Wonhee Jang ◽

Axin Hua ◽

Sandra V. Spilson ◽

Webb Miller ◽

Bruce A. Roe ◽

...

Keyword(s):

Genomic Dna ◽

Genomic Sequence ◽

Sequence Data ◽

Lysyl Oxidase ◽

Neuromuscular Disorder ◽

Bac Clone ◽

Link Type ◽

Sequence Elements ◽

Human And Mouse ◽

Mouse Genomic

The mnd2 mutation on mouse chromosome 6 produces a progressive neuromuscular disorder. To determine the gene content of the 400-kb mnd2 nonrecombinant region, we sequenced 108 kb of mouse genomic DNA and 92 kb of human genomic sequence from the corresponding region of chromosome 2p13.3. Three genes with the indicated sizes and intergenic distances were identified:D6Mm5e (⩾81 kb)–787 bp–DOK (2 kb)–845 bp–LOR2 (⩾6 kb). D6Mm5e is expressed in many tissues at very low abundance and the predicted 526-residue protein contains no known functional domains. DOK encodes the p62dok rasGAP binding protein involved in signal transduction. LOR2 encodes a novel lysyl oxidase-related protein of 757 amino acid residues. We describe a simple search protocol for identification of conserved internal exons in genomic sequence. Evolutionary conservation proved to be a useful criterion for distinguishing between authentic exons and artifactual products obtained by exon amplification, RT–PCR, and 5′ RACE. Conserved noncoding sequence elements longer than 80 bp with ⩾75% nucleotide sequence identity comprise ∼1% of the genomic sequence in this region. Comparative analysis of this human and mouse genomic DNA sequence was an efficient method for gene identification and is independent of developmental stage or quantitative level of gene expression.[The sequence data described in this paper have been submitted to the GenBank data library under the following accession numbers: AC003061, mouse BAC clone 245c12; AC003065, human BAC clone h173(E10); AF053368, mouse Lor2 cDNA; AF084363, 108-kb contig from mouse BAC 245c12; AF084364, mouse D6Mm5ecDNA.]

Download Full-text

In silico differential display of defense-related expressed sequence tags from sugarcane tissues infected with diazotrophic endophytes

Genetics and Molecular Biology ◽

10.1590/s1415-47572001000100015 ◽

2001 ◽

Vol 24 (1-4) ◽

pp. 103-111 ◽

Cited By ~ 24

Author(s):

Marcio R. Lambais

Keyword(s):

Hierarchical Clustering ◽

Expressed Sequence Tags ◽

Expression Profiles ◽

Expression Patterns ◽

Cdna Libraries ◽

A Genome ◽

Wide Scale ◽

Clustering Approach ◽

Diazotrophic Endophytes ◽

Expressed Sequence

The expression patterns of 277 sugarcane expressed sequence tags (EST)-contigs encoding putative defense-related (DR) proteins were evaluated using the Sugarcane EST database. The DR proteins evaluated included chitinases, beta-1,3-glucanases, phenylalanine ammonia-lyases, chalcone synthases, chalcone isomerases, isoflavone reductases, hydroxyproline-rich glycoproteins, proline-rich glycoproteins, peroxidases, catalases, superoxide dismutases, WRKY-like transcription factors and proteins involved in cell death control. Putative sugarcane WRKY proteins were compared and their phylogenetic relationships determined. A hierarchical clustering approach was used to identify DR ESTs with similar expression profiles in representative cDNA libraries. To identify DR ESTs differentially expressed in sugarcane tissues infected with Gluconacetobacter diazotrophicus or Herbaspirillum rubrisubalbicans, 179 putative DR EST-contigs expressed in non-infected tissues (leaves and roots) and/or infected tissues were selected and arrayed by similarity of their expression profiles. Changes in the expression levels of 124 putative DR EST-contigs, expressed in non-infected tissues, were evaluated in infected tissues. Approximately 42% of these EST-contigs showed no expression in infected tissues, whereas 15% and 3% showed more than 2-fold suppression in tissues infected with G. diazotrophicus or H. rubrisubalbicans, respectively. Approximately 14 and 8% of the DR EST-contigs evaluated showed more than 2-fold induction in tissues infected with G. diazotrophicus or H. rubrisubalbicans, respectively. The differential expression of clusters of DR genes may be important in the establishment of a compatible interaction between sugarcane and diazotrophic endophytes. It is suggested that the hierarchical clustering approach can be used on a genome-wide scale to identify genes likely involved in controlling plant-microorganism interactions.

Download Full-text

The Complex Repeats of Dictyostelium discoideum

Genome Research ◽

10.1101/gr.162201 ◽

2001 ◽

Vol 11 (4) ◽

pp. 585-594

Author(s):

Gernot Glöckner ◽

Karol Szafranski ◽

Thomas Winckler ◽

Theodor Dingermann ◽

Michael A. Quail ◽

...

Keyword(s):

Transposable Elements ◽

Dictyostelium Discoideum ◽

Copy Number ◽

Sequence Data ◽

Repetitive Elements ◽

Data Resource ◽

Valuable Data ◽

Link Type ◽

Small Complex ◽

Data Library

In the course of determining the sequence of the Dictyostelium discoideum genome we have characterized in detail the quantity and nature of interspersed repetitive elements present in this species. Several of the most abundant small complex repeats and transposons (DIRS-1; TRE3-A,B; TRE5-A; skipper; Tdd-4; H3R) have been described previously. In our analysis we have identified additional elements. Thus, we can now present a complete list of complex repetitive elements in D. discoideum. All elements add up to 10% of the genome. Some of the newly described elements belong to established classes (TRE3-C, D; TRE5-B,C; DGLT-A,P; Tdd-5). However, we have also defined two new classes of DNA transposable elements (DDT and thug) that have not been described thus far. Based on the nucleotide amount, we calculated the least copy number in each family. These vary between <10 up to >200 copies. Unique sequences adjacent to the element ends and truncation points in elements gave a measure for the fragmentation of the elements. Furthermore, we describe the diversity of single elements with regard to polymorphisms and conserved structures. All elements show insertion preference into loci in which other elements of the same family reside. The analysis of the complex repeats is a valuable data resource for the ongoing assembly of whole D. discoideum chromosomes.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AF135841, AF298201, AF298202, AF298203, AF298204,AF298205, AF298206, AF298207, AF298208, AF298209, AF298210 and AF298624.]

Download Full-text

Identification, Characterization, and Mapping of Expressed Sequence Tags from an Embryonic Zebrafish Heart cDNA Library

Genome Research ◽

10.1101/gr.154000 ◽

2000 ◽

Vol 10 (12) ◽

pp. 1915-1927

Author(s):

Christopher Ton ◽

David M. Hwang ◽

Adam A. Dempsey ◽

Hong-Chang Tang ◽

Jennifer Yoon ◽

...

Keyword(s):

Cdna Library ◽

Expressed Sequence Tags ◽

Sequence Data ◽

Cell Structure ◽

Expression Patterns ◽

Data Set ◽

Link Type ◽

Chromosome Segments ◽

Expressed Sequence ◽

Zebrafish Heart

The generation of expressed sequence tags (ESTs) has proven to be a rapid and economical approach by which to identify and characterize expressed genes. We generated 5102 ESTs from a 3-d-old embryonic zebrafish heart cDNA library. Of these, 57.6% matched to known genes, 14.2% matched only to other ESTs, and 27.8% showed no match to any ESTs or known genes. Clustering of all ESTs identified 359 unique clusters comprising 1771 ESTs, whereas the remaining 3331 ESTs did not cluster. This estimates the number of unique genes identified in the data set to be approximately 3690. A total of 1242 unique known genes were used to analyze the gene expression patterns in the zebrafish embryonic heart. These were categorized into seven categories on the basis of gene function. The largest class of genes represented those involved in gene/protein expression (25.9% of known transcripts). This class was followed by genes involved in metabolism (18.7%), cell structure/motility (16.4%), cell signaling and communication (9.6%), cell/organism defense (7.1%), and cell division (4.4%). Unclassified genes constituted the remaining 17.91%. Radiation hybrid mapping was performed for 102 ESTs and comparison of map positions between zebrafish and human identified new synteny groups. Continued comparative analysis will be useful in defining the boundaries of conserved chromosome segments between zebrafish and humans, which will facilitate the transfer of genetic information between the two organisms and improve our understanding of vertebrate evolution.[The sequence data described in this paper have been submitted to the GenBank data library under accession nos.BE693120–BE693210 and BE704450.]

Download Full-text

Enabling Precision Medicine via standard communication of HTS provenance, analysis, and results

10.1101/191783 ◽

2017 ◽

Author(s):

Gil Alterovitz ◽

Dennis Dean ◽

Carole Goble ◽

Michael R. Crusoe ◽

Stian Soiland-Reyes ◽

...

Keyword(s):

Precision Medicine ◽

High Throughput Sequencing ◽

Genomic Sequence ◽

Sequence Data ◽

Data Provenance ◽

Provenance Analysis ◽

Link Type ◽

Sequencing Studies ◽

Standardized Reporting ◽

Personalized Approach

AbstractA personalized approach based on a patient’s or pathogen’s unique genomic sequence is the foundation of precision medicine. Genomic findings must be robust and reproducible, and experimental data capture should adhere to FAIR guiding principles. Moreover, effective precision medicine requires standardized reporting that extends beyond wet lab procedures to computational methods. The BioCompute framework (https://osf.io/zm97b/) enables standardized reporting of genomic sequence data provenance, including provenance domain, usability domain, execution domain, verification kit, and error domain. This framework facilitates communication and promotes interoperability. Bioinformatics computation instances that employ the BioCompute framework are easily relayed, repeated if needed and compared by scientists, regulators, test developers, and clinicians. Easing the burden of performing the aforementioned tasks greatly extends the range of practical application. Large clinical trials, precision medicine, and regulatory submissions require a set of agreed upon standards that ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCO) offer that standard, and are freely accessible as a GitHub organization (https://github.com/biocompute-objects) following the “Open-Stand.org principles for collaborative open standards development”. By communication of high-throughput sequencing studies using a BCO, regulatory agencies (e.g., FDA), diagnostic test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, potentially decreasing the time and cost associated with next generation sequencing workflow exchange, reporting, and regulatory reviews.

Download Full-text

Genomes of the class Erysipelotrichia clarify the firmicute origin of the class Mollicutes

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.048983-0 ◽

2013 ◽

Vol 63 (Pt_7) ◽

pp. 2727-2741 ◽

Cited By ~ 28

Author(s):

James J. Davis ◽

Fangfang Xia ◽

Ross A. Overbeek ◽

Gary J. Olsen

Keyword(s):

Ribosomal Proteins ◽

Genomic Sequence ◽

Sequence Data ◽

Taxonomic Revision ◽

Trna Synthetase ◽

Microbial Evolution ◽

23S Rrna ◽

Content Type ◽

Link Type ◽

Metabolic Functions

The tree of life is paramount for achieving an integrated understanding of microbial evolution and the relationships between physiology, genealogy and genomics. It provides the framework for interpreting environmental sequence data, whether applied to microbial ecology or to human health. However, there remain many instances where there is ambiguity in our understanding of the phylogeny of major lineages, and/or confounding nomenclature. Here we apply recent genomic sequence data to examine the evolutionary history of members of the classes Mollicutes (phylum Tenericutes ) and Erysipelotrichia (phylum Firmicutes ). Consistent with previous analyses, we find evidence of a specific relationship between them in molecular phylogenies and signatures of the 16S rRNA, 23S rRNA, ribosomal proteins and aminoacyl-tRNA synthetase proteins. Furthermore, by mapping functions over the phylogenetic tree we find that the erysipelotrichia lineages are involved in various stages of genomic reduction, having lost (often repeatedly) a variety of metabolic functions and the ability to form endospores. Although molecular phylogeny has driven numerous taxonomic revisions, we find it puzzling that the most recent taxonomic revision of the phyla Firmicutes and Tenericutes has further separated them into distinct phyla, rather than reflecting their common roots.

Download Full-text