genome informatics Latest Research Papers

Mouse Genome Informatics (MGI): latest news from MGD and GXD

Mammalian Genome ◽

10.1007/s00335-021-09921-0 ◽

2021 ◽

Author(s):

Martin Ringwald ◽

Joel E. Richardson ◽

Richard M. Baldarelli ◽

Judith A. Blake ◽

James A. Kadin ◽

...

Keyword(s):

Gene Expression ◽

Mouse Genome ◽

Primary Source ◽

Mouse Genome Database ◽

Mouse Genome Informatics ◽

Genome Database ◽

Functional Annotations ◽

Developmental Gene Expression ◽

Health And Disease ◽

Genome Informatics

AbstractThe Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI’s mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI’s two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org.

Download Full-text

Murine allele and transgene symbols: ensuring unique, concise, and informative nomenclature

Mammalian Genome ◽

10.1007/s00335-021-09902-3 ◽

2021 ◽

Author(s):

M. N. Perry ◽

C. L. Smith

Keyword(s):

Mouse Strain ◽

Mouse Genome ◽

Model Organism ◽

Mouse Genome Informatics ◽

International Committee ◽

Expression Data ◽

Multiple Resources ◽

Naturally Occurring ◽

Genome Informatics ◽

Cross Species Comparison

AbstractIn addition to naturally occurring sequence variation and spontaneous mutations, a wide array of technologies exist for modifying the mouse genome. Standardized nomenclature, including allele, transgene, and other mutation nomenclature, as well as persistent unique identifiers (PUID) are critical for effective scientific communication, comparison of results, and integration of data into knowledgebases such as Mouse Genome Informatics (MGI), Alliance for Genome Resources, and International Mouse Strain Resource (IMSR). As well as being the authoritative source for mouse gene, allele, and strain nomenclature, MGI integrates published and unpublished genomic, phenotypic, and expression data while linking to other online resources for a complete view of the mouse as a valuable model organism. The International Committee on Standardized Genetic Nomenclature for Mice has developed allele nomenclature rules and guidelines that take into account the number of genes impacted, the method of allele generation, and the nature of the sequence alteration. To capture details that cannot be included in allele symbols, MGI has further developed allele to gene relationships using sequence ontology (SO) definitions for mutations that provide links between alleles and the genes affected. MGI is also using (HGVS) variant nomenclature for variants associated with alleles that will enhance searching for mutations and will improve cross-species comparison. With the ability to assign unique and informative symbols as well as to link alleles with more than one gene, allele and transgene nomenclature rules and guidelines provide an unambiguous way to represent alterations in the mouse genome and facilitate data integration among multiple resources such the Alliance of Genome Resources and International Mouse Strain Resource.

Download Full-text

HGNChelper: identification and correction of invalid gene symbols for human and mouse

F1000Research ◽

10.12688/f1000research.28033.1 ◽

2020 ◽

Vol 9 ◽

pp. 1493

Author(s):

Sehyun Oh ◽

Jasmine Abdelnabi ◽

Ragheed Al-Dulaimi ◽

Ayush Aggarwal ◽

Marcel Ramos ◽

...

Keyword(s):

Mouse Genome ◽

Mouse Genome Informatics ◽

R Package ◽

Gene Expression Omnibus ◽

Gene Symbol ◽

P Gene ◽

Human Genes ◽

Human And Mouse ◽

Genome Informatics ◽

Gene Symbols

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.

Download Full-text

Open-source mapping and variant calling for large-scale NGS data from original base-quality scores

10.1101/2020.12.15.356360 ◽

2020 ◽

Author(s):

Olga Krasheninina ◽

Yih-Chii Hwang ◽

Xiaodong Bai ◽

Aleksandra Zalcman ◽

Evan Maxwell ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Variant Calling ◽

Quality Score ◽

Read Mapping ◽

Base Quality Score ◽

Key Features ◽

Ngs Data ◽

Genome Informatics ◽

Reproducible Manner

AbstractStandardized genome informatics protocols minimize reprocessing costs and facilitate harmonization across studies if implemented in a transparent, accessible and reproducible manner. Here we define the OQFE protocol, a lossless read-mapping protocol that retains key features of existing NGS standard methods. We demonstrate that variants can be called directly from NovaSeq OQFE data without the need for base quality score recalibration and describe a large-scale variant calling protocol for OQFE data. The OQFE protocol is open-source and a containerized implementation is provided.

Download Full-text

HGNChelper: identification and correction of invalid gene symbols for human and mouse

10.1101/2020.09.16.300632 ◽

2020 ◽

Author(s):

Sehyun Oh ◽

Jasmine Abdelnabi ◽

Ragheed Al-Dulaimi ◽

Ayush Aggarwal ◽

Marcel Ramos ◽

...

Keyword(s):

Mouse Genome ◽

Mouse Genome Informatics ◽

R Package ◽

Gene Expression Omnibus ◽

Gene Symbol ◽

Human Genes ◽

Open Development ◽

Human And Mouse ◽

Genome Informatics ◽

Gene Symbols

AbstractGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ∼3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN, with open development and issue tracking on GitHub and an associated pkgdown site https://waldronlab.io/HGNChelper/.

Download Full-text

Guest Editorial for the 29th International Conference on Genome Informatics (GIW 2018)

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.2978606 ◽

2020 ◽

Vol 17 (3) ◽

pp. 726-727

Author(s):

Jie Zheng ◽

Jinyan Li ◽

Yun Zheng

Keyword(s):

Guest Editorial ◽

International Conference ◽

Genome Informatics

Download Full-text

Mouse Genome Informatics

10.32388/tr0o0b ◽

2020 ◽

Author(s):

Keyword(s):

Mouse Genome ◽

Mouse Genome Informatics ◽

Genome Informatics

Download Full-text

Integrating image caption information into biomedical document classification in support of biocuration

Database ◽

10.1093/database/baaa024 ◽

2020 ◽

Vol 2020 ◽

Author(s):

Xiangying Jiang ◽

Pengyuan Li ◽

James Kadin ◽

Judith A Blake ◽

Martin Ringwald ◽

...

Keyword(s):

Gene Expression ◽

Classification Scheme ◽

Mouse Genome Informatics ◽

Document Classification ◽

Publication Rate ◽

Classification Task ◽

Biological Databases ◽

Vast Number ◽

Pertinent Information ◽

Genome Informatics

Abstract Gathering information from the scientific literature is essential for biomedical research, as much knowledge is conveyed through publications. However, the large and rapidly increasing publication rate makes it impractical for researchers to quickly identify all and only those documents related to their interest. As such, automated biomedical document classification attracts much interest. Such classification is critical in the curation of biological databases, because biocurators must scan through a vast number of articles to identify pertinent information within documents most relevant to the database. This is a slow, labor-intensive process that can benefit from effective automation. We present a document classification scheme aiming to identify papers containing information relevant to a specific topic, among a large collection of articles, for supporting the biocuration classification task. Our framework is based on a meta-classification scheme we have introduced before; here we incorporate into it features gathered from figure captions, in addition to those obtained from titles and abstracts. We trained and tested our classifier over a large imbalanced dataset, originally curated by the Gene Expression Database (GXD). GXD collects all the gene expression information in the Mouse Genome Informatics (MGI) resource. As part of the MGI literature classification pipeline, GXD curators identify MGI-selected papers that are relevant for GXD. The dataset consists of ~60 000 documents (5469 labeled as relevant; 52 866 as irrelevant), gathered throughout 2012–2016, in which each document is represented by the text of its title, abstract and figure captions. Our classifier attains precision 0.698, recall 0.784, f-measure 0.738 and Matthews correlation coefficient 0.711, demonstrating that the proposed framework effectively addresses the high imbalance in the GXD classification task. Moreover, our classifier’s performance is significantly improved by utilizing information from image captions compared to using titles and abstracts alone; this observation clearly demonstrates that image captions provide substantial information for supporting biomedical document classification and curation. Database URL:

Download Full-text

Genome Informatics Pipelines and Genome Browsers

Applied Genomics and Public Health ◽

10.1016/b978-0-12-813695-9.00008-x ◽

2020 ◽

pp. 149-169

Author(s):

Evaggelia Barba ◽

Evangelia-Eirini Tsermpini ◽

George P. Patrinos ◽

Maria Koromina

Keyword(s):

Genome Browsers ◽

Genome Informatics

Download Full-text

SGID: a comprehensive and interactive database of the silkworm

10.1101/739961 ◽

2019 ◽

Author(s):

Zhenglin Zhu ◽

Zhufen Guan ◽

Gexin Liu ◽

Yawang Wang ◽

Ze Zhang

Keyword(s):

Tertiary Structure ◽

Subcellular Location ◽

Biological Data ◽

Test Results ◽

Function Annotation ◽

Interactive Analysis ◽

Silkworm Genome ◽

Genome Scale ◽

Genome Informatics ◽

High Depth

AbstractAlthough the domestic silkworm (Bombyx mori) is an important model and economic animal, there is a lack of comprehensive database for this organism. Here, we developed the silkworm genome informatics database, SGID. It aims to bring together all silkworm related biological data and provide an interactive platform for gene inquiry and analysis. The function annotation in SGID is thorough and covers 98% of the silkworm genes. The annotation details include function description, gene ontology, KEGG, pathway, subcellular location, transmembrane topology, protein secondary/tertiary structure, homologous group and transcription factor. SGID provides genome scale visualization of population genetics test results based on high depth resequencing data of 158 silkworm samples. It also provides interactive analysis tools of transcriptomic and epigenomic data from 79 NCBI BioProjects. SGID is freely available at http://sgid.popgenetics.net. This database will be extremely useful to silkworm research in the future.

Download Full-text

genome informatics
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Mouse Genome Informatics (MGI): latest news from MGD and GXD

Murine allele and transgene symbols: ensuring unique, concise, and informative nomenclature

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Open-source mapping and variant calling for large-scale NGS data from original base-quality scores

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Guest Editorial for the 29th International Conference on Genome Informatics (GIW 2018)

Mouse Genome Informatics

Integrating image caption information into biomedical document classification in support of biocuration

Genome Informatics Pipelines and Genome Browsers

SGID: a comprehensive and interactive database of the silkworm

Export Citation Format

genome informaticsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Mouse Genome Informatics (MGI): latest news from MGD and GXD

Murine allele and transgene symbols: ensuring unique, concise, and informative nomenclature

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Open-source mapping and variant calling for large-scale NGS data from original base-quality scores

HGNChelper: identification and correction of invalid gene symbols for human and mouse

Guest Editorial for the 29th International Conference on Genome Informatics (GIW 2018)

Mouse Genome Informatics

Integrating image caption information into biomedical document classification in support of biocuration

Genome Informatics Pipelines and Genome Browsers

SGID: a comprehensive and interactive database of the silkworm

genome informatics
Recently Published Documents