genome informatics
Recently Published Documents


TOTAL DOCUMENTS

119
(FIVE YEARS 2)

H-INDEX

14
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Martin Ringwald ◽  
Joel E. Richardson ◽  
Richard M. Baldarelli ◽  
Judith A. Blake ◽  
James A. Kadin ◽  
...  

AbstractThe Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI’s mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI’s two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org.


2021 ◽  
Author(s):  
M. N. Perry ◽  
C. L. Smith

AbstractIn addition to naturally occurring sequence variation and spontaneous mutations, a wide array of technologies exist for modifying the mouse genome. Standardized nomenclature, including allele, transgene, and other mutation nomenclature, as well as persistent unique identifiers (PUID) are critical for effective scientific communication, comparison of results, and integration of data into knowledgebases such as Mouse Genome Informatics (MGI), Alliance for Genome Resources, and International Mouse Strain Resource (IMSR). As well as being the authoritative source for mouse gene, allele, and strain nomenclature, MGI integrates published and unpublished genomic, phenotypic, and expression data while linking to other online resources for a complete view of the mouse as a valuable model organism. The International Committee on Standardized Genetic Nomenclature for Mice has developed allele nomenclature rules and guidelines that take into account the number of genes impacted, the method of allele generation, and the nature of the sequence alteration. To capture details that cannot be included in allele symbols, MGI has further developed allele to gene relationships using sequence ontology (SO) definitions for mutations that provide links between alleles and the genes affected. MGI is also using (HGVS) variant nomenclature for variants associated with alleles that will enhance searching for mutations and will improve cross-species comparison. With the ability to assign unique and informative symbols as well as to link alleles with more than one gene, allele and transgene nomenclature rules and guidelines provide an unambiguous way to represent alterations in the mouse genome and facilitate data integration among multiple resources such the Alliance of Genome Resources and International Mouse Strain Resource.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1493
Author(s):  
Sehyun Oh ◽  
Jasmine Abdelnabi ◽  
Ragheed Al-Dulaimi ◽  
Ayush Aggarwal ◽  
Marcel Ramos ◽  
...  

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.


2020 ◽  
Author(s):  
Olga Krasheninina ◽  
Yih-Chii Hwang ◽  
Xiaodong Bai ◽  
Aleksandra Zalcman ◽  
Evan Maxwell ◽  
...  

AbstractStandardized genome informatics protocols minimize reprocessing costs and facilitate harmonization across studies if implemented in a transparent, accessible and reproducible manner. Here we define the OQFE protocol, a lossless read-mapping protocol that retains key features of existing NGS standard methods. We demonstrate that variants can be called directly from NovaSeq OQFE data without the need for base quality score recalibration and describe a large-scale variant calling protocol for OQFE data. The OQFE protocol is open-source and a containerized implementation is provided.


2020 ◽  
Author(s):  
Sehyun Oh ◽  
Jasmine Abdelnabi ◽  
Ragheed Al-Dulaimi ◽  
Ayush Aggarwal ◽  
Marcel Ramos ◽  
...  

AbstractGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ∼3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN, with open development and issue tracking on GitHub and an associated pkgdown site https://waldronlab.io/HGNChelper/.


Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Xiangying Jiang ◽  
Pengyuan Li ◽  
James Kadin ◽  
Judith A Blake ◽  
Martin Ringwald ◽  
...  

Abstract Gathering information from the scientific literature is essential for biomedical research, as much knowledge is conveyed through publications. However, the large and rapidly increasing publication rate makes it impractical for researchers to quickly identify all and only those documents related to their interest. As such, automated biomedical document classification attracts much interest. Such classification is critical in the curation of biological databases, because biocurators must scan through a vast number of articles to identify pertinent information within documents most relevant to the database. This is a slow, labor-intensive process that can benefit from effective automation. We present a document classification scheme aiming to identify papers containing information relevant to a specific topic, among a large collection of articles, for supporting the biocuration classification task. Our framework is based on a meta-classification scheme we have introduced before; here we incorporate into it features gathered from figure captions, in addition to those obtained from titles and abstracts. We trained and tested our classifier over a large imbalanced dataset, originally curated by the Gene Expression Database (GXD). GXD collects all the gene expression information in the Mouse Genome Informatics (MGI) resource. As part of the MGI literature classification pipeline, GXD curators identify MGI-selected papers that are relevant for GXD. The dataset consists of ~60 000 documents (5469 labeled as relevant; 52 866 as irrelevant), gathered throughout 2012–2016, in which each document is represented by the text of its title, abstract and figure captions. Our classifier attains precision 0.698, recall 0.784, f-measure 0.738 and Matthews correlation coefficient 0.711, demonstrating that the proposed framework effectively addresses the high imbalance in the GXD classification task. Moreover, our classifier’s performance is significantly improved by utilizing information from image captions compared to using titles and abstracts alone; this observation clearly demonstrates that image captions provide substantial information for supporting biomedical document classification and curation. Database URL:


Author(s):  
Evaggelia Barba ◽  
Evangelia-Eirini Tsermpini ◽  
George P. Patrinos ◽  
Maria Koromina

2019 ◽  
Author(s):  
Zhenglin Zhu ◽  
Zhufen Guan ◽  
Gexin Liu ◽  
Yawang Wang ◽  
Ze Zhang

AbstractAlthough the domestic silkworm (Bombyx mori) is an important model and economic animal, there is a lack of comprehensive database for this organism. Here, we developed the silkworm genome informatics database, SGID. It aims to bring together all silkworm related biological data and provide an interactive platform for gene inquiry and analysis. The function annotation in SGID is thorough and covers 98% of the silkworm genes. The annotation details include function description, gene ontology, KEGG, pathway, subcellular location, transmembrane topology, protein secondary/tertiary structure, homologous group and transcription factor. SGID provides genome scale visualization of population genetics test results based on high depth resequencing data of 158 silkworm samples. It also provides interactive analysis tools of transcriptomic and epigenomic data from 79 NCBI BioProjects. SGID is freely available at http://sgid.popgenetics.net. This database will be extremely useful to silkworm research in the future.


Sign in / Sign up

Export Citation Format

Share Document