G-OnRamp: Generating genome browsers to facilitate undergraduate-driven collaborative genome annotation

Mapping Intimacies ◽

10.1101/781658 ◽

2019 ◽

Author(s):

Luke Sargent ◽

Yating Liu ◽

Wilson Leung ◽

Nathan T. Mortimer ◽

David Lopatto ◽

...

Keyword(s):

Genome Annotation ◽

Gene Annotation ◽

Sequence Similarity ◽

Gene Prediction ◽

Phenotypic Traits ◽

Wasp Species ◽

Major Barrier ◽

Link Type ◽

A Genome ◽

Genome Browsers

AbstractScientists are sequencing new genomes at an increasing rate with the goal of associating genome contents with phenotypic traits. After a new genome is sequenced and assembled, structural gene annotation is often the first step in analysis. Despite advances in computational gene prediction algorithms, most eukaryotic genomes still benefit from manual gene annotation. Undergraduates can become skilled annotators, and in the process learn both about genes/genomes and about how to utilize large datasets. Data visualizations provided by a genome browser are essential for manual gene annotation, enabling annotators to quickly evaluate multiple lines of evidence (e.g., sequence similarity, RNA-Seq, gene predictions, repeats). However, creating genome browsers requires extensive computational skills; lack of the expertise required remains a major barrier for many biomedical researchers and educators.To address these challenges, the Genomics Education Partnership (GEP; https://gep.wustl.edu/) has partnered with the Galaxy Project (https://galaxyproject.org) to develop G-OnRamp (http://g-onramp.org), a web-based platform for creating UCSC Assembly Hubs and JBrowse genome browsers. G-OnRamp can also convert a JBrowse instance into an Apollo instance for collaborative genome annotations in research and educational settings. G-OnRamp enables researchers to easily visualize their experimental results, educators to create Course-based Undergraduate Research Experiences (CUREs) centered on genome annotation, and students to participate in genomics research.Development of G-OnRamp was guided by extensive user feedback from in-person workshops. Sixty-five researchers and educators from over 40 institutions participated in these workshops, which produced over 20 genome browsers now available for research and education. For example, genome browsers for four parasitoid wasp species were used in a CURE engaging 142 students taught by 13 faculty members — producing a total of 192 gene models. G-OnRamp can be deployed on a personal computer or on cloud computing platforms, and the genome browsers produced can be transferred to the CyVerse Data Store for long-term access.

Download Full-text

Nonomuraea montanisoli sp. nov., isolated from mountain forest soil

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.004695 ◽

2021 ◽

Author(s):

Suchart Chanama ◽

Chanwit Suriyachadkun ◽

Manee Chanama

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Related Species ◽

Sequence Similarity ◽

Diaminopimelic Acid ◽

Mountain Forest ◽

Rrna Gene ◽

Content Type ◽

Link Type ◽

A Genome

A novel actinomycete, strain SMC 257T, was isolated from a soil sample collected from mountain forest, Nan Province, Thailand. Strain SMC 257T formed tightly closed spiral spore chains on aerial mycelia. A polyphasic approach was used for the taxonomic study of this strain. Phylogenetic analysis based on 16S rRNA gene sequences indicated that strain SMC 257T belonged to the genus Nonomuraea , and the closest phylogenetically related species were Nonomuraea roseoviolacea subsp. carminata JCM 9946T (98.9 % 16S rRNA gene sequence similarity), Nonomuraea rhodomycinica TBRC 6557T (98.4 %), and Nonomuraea roseoviolacea subsp. roseoviolacea JCM 3145T (98.3 %). Genome sequencing revealed a genome size of 9.76 Mbp and a G+C content of 72.3 mol%. The genome average nucleotide identity (ANI) and the digital DNA–DNA hybridization (dDDH) values that distinguished this novel strain from its closest related species were species boundary of 95–96 % and 70 %, respectively. The cell wall peptidoglycan contained meso-diaminopimelic acid. The whole-cell sugars were glucose, ribose, madurose and mannose. The major menaquinone was MK-9(H4). The polar lipid profile consisted of phosphatidylethanolamine, hydroxyphosphatidylethanolamine, lysophosphatidylethanolamine, diphosphatidylglycerol, N-phosphatidylglycerol, phosphatidylinositol and phosphatidylinositol mannosides. The predominant cellular fatty acids were C17 : 0 10-methyl and iso-C16 : 0. Based on comparative analysis of phenotypic, chemotaxonomic and genotypic data, strain SMC 257T is considered to represent a novel species of the genus Nonomuraea , for which the name Nonomuraea montanisoli is proposed. The type strain is SMC 257T (=TBRC 13065T=NBRC 114772T).

Download Full-text

Hansschlegelia quercus sp. nov., a novel methylotrophic bacterium isolated from oak buds

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.004323 ◽

2020 ◽

Vol 70 (8) ◽

pp. 4646-4652 ◽

Cited By ~ 5

Author(s):

Nadezhda V. Agafonova ◽

Elena N. Kaparullina ◽

Denis S. Grouzdev ◽

Nina V. Doronina

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Type Species ◽

Sequence Similarity ◽

Methylotrophic Bacterium ◽

Rrna Gene ◽

Methylotrophic Bacteria ◽

Content Type ◽

Link Type ◽

A Genome

Novel aerobic, restricted facultatively methylotrophic bacteria were isolated from buds of English oak (Quercus robur L.; strain DubT) and northern red oak (Quercus rubra L.; strain KrD). The isolates were Gram-negative, asporogenous, motile short rods that multiplied by binary fisson. They utilized methanol, methylamine and a few polycarbon compounds as carbon and energy sources. Optimal growth occurred at 25 °C and pH 7.5. The dominant phospholipids were phosphatidylethanolamine, phosphatidylcholine, diphosphatidylglycerol and phoshatidylglycerol. The major cellular fatty acids of cells were C18 : 1 ω7c, 11-methyl C18 : 1 ω7c and C16 : 0. The major ubiquinone was Q-10. Analysis of 16S rRNA gene sequences showed that the strains were closely related to the members of the genus Hansschlegelia : Hansschlegelia zhihuaiae S113T(97.5–98.0 %), Hansschlegelia plantiphila S1T (97.4–97.6 %) and Hansschlegelia beijingensis PG04T(97.0–97.2 %). The 16S rRNA gene sequence similarity between strains DubT and KrD was 99.7 %, and the DNA–DNA hybridization (DDH) result between the strains was 85 %. The ANI and the DDH values between strain DubT and H. zhihuaiae S113T were 80.1 and 21.5 %, respectively. Genome sequencing of the strain DubT revealed a genome size of 3.57 Mbp and a G+C content of 67.0 mol%. Based on the results of the phenotypic, chemotaxonomic and genotypic analyses, it is proposed that the isolates be assigned to the genus Hansschlegelia as Hansschlegelia quercus sp. nov. with the type strain DubT (=VKM B-3284T=CCUG 73648T=JCM 33463T).

Download Full-text

Pseudoalteromonas ostreae sp. nov., a new bacterial species harboured by the flat oyster Ostrea edulis

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.005070 ◽

2021 ◽

Vol 71 (11) ◽

Author(s):

Héléna Cuny ◽

Clément Offret ◽

Amine M. Boukerb ◽

Leila Parizadeh ◽

Olivier Lesouhaitier ◽

...

Keyword(s):

Type Species ◽

Bacterial Species ◽

Acid Methyl Ester ◽

Phenotypic Traits ◽

Rrna Gene ◽

Ostrea Edulis ◽

Content Type ◽

Link Type ◽

A Genome ◽

Flat Oyster

Three bacterial strains, named hOe-66T, hOe-124 and hOe-125, were isolated from the haemolymph of different specimens of the flat oyster Ostrea edulis collected in Concarneau bay (Finistère, France). These strains were characterized by a polyphasic approach, including (i) whole genome analyses with 16S rRNA gene sequence alignment and pangenome analysis, determination of the G+C content, average nucleotide identity (ANI), and in silico DNA–DNA hybridization (isDDH), and (ii) fatty acid methyl ester and other phenotypic analyses. Strains hOe-66T, hOe-124 and hOe-125 were closely related to both type strains Pseudoalteromonas rhizosphaerae RA15T and Pseudoalteromonas neustonica PAMC 28425T with less than 93.3% ANI and 52.3% isDDH values. Regarding their phenotypic traits, the three strains were Gram-negative, 1–2 µm rod-shaped, aerobic, motile and non-spore-forming bacteria. Cells grew optimally at 25 °C in 2.5% NaCl and at 7–8 pH. The most abundant fatty acids were summed feature 3 (C16:1 ω7c/C16:1 ω6c), C16:0 and C17:1 ω8c. The strains carried a genome average size of 4.64 Mb and a G+C content of 40.28 mol%. The genetic and phenotypic results suggested that strains hOe-66T, hOe-124 and hOe-125 belong to a new species of the genus Pseudoalteromonas . In this context, we propose the name Pseudoalteromonas ostreae sp. nov. The type strain is hOe-66T (=CECT 30303T=CIP 111911T).

Download Full-text

A modified GC-specific MAKER gene annotation method reveals improved and novel gene predictions of high and low GC content in Oryza sativa

10.1101/115345 ◽

2017 ◽

Author(s):

Megan J. Bowman ◽

Jane A. Pulman ◽

Tiffany L. Liu ◽

Kevin L. Childs

Keyword(s):

Oryza Sativa ◽

Gene Annotation ◽

Gene Prediction ◽

Biological Significance ◽

Gc Content ◽

Training Data ◽

Structural Annotation ◽

Gene Variation ◽

A Genome ◽

Grass Genomes

AbstractAccurate structural annotation depends on well-trained gene prediction programs. Training data for gene prediction programs are often chosen randomly from a subset of high-quality genes that ideally represent the variation found within a genome. One aspect of gene variation is GC content, which differs across species and is bimodal in grass genomes. We find that gene prediction programs trained on genes with random GC content do not completely predict all grass genes with extreme GC content. We present a new GC-specific MAKER annotation protocol to predict new and improved gene models and assess the biological significance of this method in Oryza sativa.

Download Full-text

Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins

Genome Research ◽

10.1101/gr.183801 ◽

2001 ◽

Vol 11 (10) ◽

pp. 1632-1640

Author(s):

Hedi Hegyi ◽

Mark Gerstein

Keyword(s):

Genome Annotation ◽

Large Scale ◽

Sequence Similarity ◽

Functional Divergence ◽

Open Reading Frames ◽

Single Domain ◽

Functional Conservation ◽

Link Type ◽

Approximate Function ◽

The Relationship

Annotation transfer is a principal process in genome annotation. It involves “transferring” structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available athttp://partslist.org/func orhttp://bioinfo.mbb.yale.edu/partslist/func.

Download Full-text

immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking

10.1101/759795 ◽

2019 ◽

Cited By ~ 3

Author(s):

Cédric R. Weber ◽

Rahmad Akbar ◽

Alexander Yermanos ◽

Milena Pavlović ◽

Igor Snapkov ◽

...

Keyword(s):

T Cell ◽

T Cell Receptor ◽

Network Architecture ◽

Gene Annotation ◽

Sequence Similarity ◽

Cell Receptor ◽

Germline Gene ◽

Immune Receptor ◽

Link Type ◽

Estimation Sequence

AbstractSummaryB- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full length variable region immune receptor sequences. ImmuneSIM enables the tuning of the immune receptor features: (i) species and chain type (BCR, TCR, single, paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation, and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis, and machine learning methods for motif detection.AvailabilityThe package is available via https://github.com/GreiffLab/immuneSIM and will also be available at CRAN (submitted). The documentation is hosted at https://[email protected], [email protected]

Download Full-text

Nonomuraea nitratireducens sp. nov., a new actinobacterium isolated from Suaeda australis Moq. rhizosphere

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.004377 ◽

2020 ◽

Vol 70 (9) ◽

pp. 5026-5031 ◽

Cited By ~ 4

Author(s):

Yixin Ou ◽

Yong Sheng ◽

Xiaojing Hu ◽

Dongjin Leng ◽

Jiafu Huang ◽

...

Keyword(s):

Type Species ◽

Sequence Similarity ◽

Diaminopimelic Acid ◽

Rrna Gene ◽

Content Type ◽

Link Type ◽

Reduction Activity ◽

A Genome ◽

Pr China ◽

Physiological Evaluation

A novel actinomycete, designated WYY166T, was isolated from the rhizosphere of Suaeda australis Moq. collected in Dongfang, PR China. The taxonomic position of this strain was investigated using a polyphasic approach. Phylogenetic analysis based on its 16S rRNA gene referred strain WYY166T to the genus Nonomuraea , and it was most closely related to the type strains Nonomuraea candida HMC10T, Nonomuraea turkmeniaca DSM 43926T, Nonomuraea maritima NBRC 106687T and Nonomuraea polychroma DSM 43925T (98.35, 97.60, 97.36 and 97.30% sequence similarity, respectively). Genome sequencing revealed a genome size of 11.27 Mbp and a G+C content of 71.10 mol%. The genome average nucleotide identity (ANI) values and the digital DNA - DNA hybridization (dDDH) values between strain WYY166T and the other species of the genus were found to be low (ANI 81.63~85.23 %, dDDH 23.6~31.6 %), suggesting that it represented a new species. The physiological evaluation showed that it had remarkable nitrate reduction activity. The whole-cell hydrolysates contained meso-diaminopimelic acid and madurose. The N-acyl type of muramic acid was acetyl. The major menaquinones were MK-9 (H4) (86.9 %) and MK-9 (H2) (13.1 %). The predominant fatty acids were iso-C16 : 0 (53.2 %), 10-methyl C17 : 0 (10.7 %), C17 : 1 ω6c (8.3 %) and iso-C16 : 1 h (7.3 %). These physiological, biochemical and chemotaxonomic data suggested that strain WYY166T should be classified as representing a novel species of the genus Nonomuraea , for which the name Nonomuraea nitratireducens sp. nov. is proposed. The type strain is WYY166T (=MCCC 1K03779T=KCTC 49343T).

Download Full-text

Description of Xenorhabdus magdalenensis sp. nov., the symbiotic bacterium associated with Steinernema australe

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.034322-0 ◽

2012 ◽

Vol 62 (Pt_8) ◽

pp. 1761-1765 ◽

Cited By ~ 22

Author(s):

Patrick Tailliez ◽

Sylvie Pagès ◽

Steve Edgington ◽

Lukasz M. Tymo ◽

Alan G. Buddie

Keyword(s):

Type Species ◽

Sequence Similarity ◽

Symbiotic Bacterium ◽

Phylogenetic Position ◽

Phenotypic Traits ◽

Rrna Gene ◽

Content Type ◽

Link Type ◽

Bacterium Strain ◽

The Difference

A symbiotic bacterium, strain IMI 397775T, was isolated from the insect-pathogenic nematode Steinernema australe. On the basis of 16S rRNA gene sequence similarity, this bacterial isolate was shown to belong to the genus Xenorhabdus , in agreement with the genus of its nematode host. The accurate phylogenetic position of this new isolate was defined using a multigene approach and showed that isolate IMI 397775T shares a common ancestor with Xenorhabdus doucetiae FRM16T and Xenorhabdus romanii PR06-AT, the symbiotic bacteria associated with Steinernema diaprepesi and Steinernema puertoricense, respectively. The nucleotide identity (less than 97 %) between isolate IMI 397775T, X. doucetiae FRM16T and X. romanii PR06-AT calculated for the concatenated sequences of five gene fragments encompassing 4275 nt, several phenotypic traits and the difference between the upper temperatures that limit growth of these three bacteria allowed genetic and phenotypic differentiation of isolate IMI 397775T from the two closely related species. Strain IMI 397775T therefore represents a novel species, for which the name Xenorhabdus magdalenensis sp. nov. is proposed, with the type strain IMI 397775T ( = DSM 24915T).

Download Full-text

Exiguobacterium enclense sp. nov., isolated from sediment

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.000149 ◽

2015 ◽

Vol 65 (Pt_5) ◽

pp. 1611-1616 ◽

Cited By ~ 14

Author(s):

Syed G. Dastager ◽

Rahul Mawlankar ◽

Vidya V. Sonalkar ◽

Meghana N. Thorat ◽

Poonam Mual ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Type Species ◽

Sequence Similarity ◽

Phenotypic Traits ◽

Rrna Gene ◽

Closely Related Species ◽

Content Type ◽

Link Type ◽

Marine Sediment Sample

A Gram-stain-positive bacterium, designated strain NIO-1109T, was isolated from a marine sediment sample from Chorao Island, Goa, India. Phenotypic and chemotaxonomic characteristics and data from phylogenetic analysis based on 16S rRNA gene sequences indicated that strain NIO-1109T was related to the genus Exiguobacterium . Strain NIO-1109T exhibited >98.0 % 16S rRNA gene sequence similarity with respect to Exiguobacterium indicum HHS 31T (99.5 %) and Exiguobacterium acetylicum NCIMB 9889T (99.1 %); the type strains of other species showed <98 % similarity. Levels of DNA–DNA relatedness between strain NIO-1109T and E. acetylicum DSM 20416T and E. indicum LMG 23471T were less than 70 % (33.0±2.0 and 37±3.2 %, respectively). Strain NIO-1109T also differed from these two closely related species in a number of phenotypic traits. Based on phenotypic, chemotaxonomic and phylogenetic data, strain NIO-1109T is considered to represent a novel species of the genus Exiguobacterium , for which the name Exiguobacterium enclense sp. nov. is proposed. The type strain is NIO-1109T ( = NCIM 5457T = DSM 25128T = CCTCC AB 2011124T).

Download Full-text

Pontimicrobium aquaticum gen. nov., sp. nov., a bacterium in the family Flavobacteriaceae isolated from seawater

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijsem.0.004314 ◽

2020 ◽

Vol 70 (8) ◽

pp. 4562-4568 ◽

Cited By ~ 5

Author(s):

Thidarat Janthra ◽

Jihye Baek ◽

Jong-Hwa Kim ◽

Jung-Hoon Yoon ◽

Ampaitip Sukhoom ◽

...

Keyword(s):

Sequence Similarity ◽

Rrna Gene ◽

Strictly Aerobic ◽

Unidentified Phospholipid ◽

Respiratory Quinone ◽

Content Type ◽

Link Type ◽

A Genome ◽

The Family ◽

Distinct Lineage

A Gram-stain-negative, yellow-pigmented, non-spore-forming, non-motile, rod-shaped, catalase-positive, strictly aerobic bacterial strain, designated CAU 1491T, was isolated from seawater and its taxonomic position was examined using a polyphasic approach. Cells of strain CAU 1491T grew optimally at 30 °C, pH 7.5 and in 2.0 % (w/v) NaCl. Phylogenetic analysis based on the 16S rRNA gene sequence of CAU 1491T showed that it formed a distinct lineage within the family Flavobacteriaceae as a separate deep branch, with 97.0 % or lower sequence similarity to representatives of the genera Lacinutrix , Gaetbulibacter and Aquibacter . The major cellular fatty acids of strain CAU 1491T were iso-C15 : 0, iso-C15 : 1 G, iso-C17 : 0 3-OH and summed feature 3. The polar lipid pattern consisted of diphosphatidylglycerol, phosphatidylserine, phosphatidylethanolamine and an unidentified phospholipid. The strain contained MK-6 as the sole respiratory quinone. Genome sequencing revealed that strain CAU 1491T has a genome size of 3.13 Mbp and a G+C content of 32.4 mol%. On the basis of the phenotypic, chemotaxonomic and genomic data, strain CAU 1491T represents a new genus and species in the family Flavobacteriaceae for which the name Pontimicrobium aquaticum gen. nov., sp. nov. is proposed. The type strain of Pontimicrobium aquaticum is CAU 1491T (=KCTC 72003T=NBRC 113695T).

Download Full-text