History of rare diseases and their genetic causes - a data driven approach

Mapping Intimacies ◽

10.1101/595819 ◽

2019 ◽

Author(s):

Friederike Ehrhart ◽

Egon L. Willighagen ◽

Martina Kutmon ◽

Max van Hoften ◽

Nasim Bahram Sangani ◽

...

Keyword(s):

Rare Disease ◽

Rare Diseases ◽

Scientific Publication ◽

Data Driven ◽

Semantic Model ◽

Genetic Causes ◽

Monogenic Diseases ◽

Data Driven Approach ◽

History Of ◽

First Time

AbstractThis dataset provides information about monogenic, rare diseases with a known genetic cause supplemented with manually extracted provenance of both the disease and the discovery of the underlying genetic cause of the disease.We collected 4166 rare monogenic diseases according to their OMIM identifier, linked them to 3163 causative genes which are annotated with Ensembl identifiers and HGNC symbols. The PubMed identifier of the scientific publication, which for the first time describes the rare disease, and the publication which found the gene causing this disease were added using information from OMIM, Wikipedia, Google Scholar, Whonamedit, and PubMed. The data is available as a spreadsheet and as RDF in a semantic model modified from DisGeNET.This dataset relies on publicly available data and publications with a PubMed IDs but this is to our knowledge the first time this data has been linked and made available for further study under a liberal license. Analysis of this data reveals the timeline of rare disease and causative genes discovery and links them to developments in methods and databases.

Download Full-text

A resource to explore the discovery of rare diseases and their causative genes

Scientific Data ◽

10.1038/s41597-021-00905-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Friederike Ehrhart ◽

Egon L. Willighagen ◽

Martina Kutmon ◽

Max van Hoften ◽

Leopold M. G. Curfs ◽

...

Keyword(s):

Rare Disease ◽

Rare Diseases ◽

Genetic Background ◽

Gene Discovery ◽

Google Scholar ◽

Semantic Model ◽

Causative Gene ◽

Scientific Publications ◽

Monogenic Diseases ◽

First Time

AbstractHere, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.

Download Full-text

Rare disease knowledge enrichment through a data-driven approach

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-019-0752-9 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 12

Author(s):

Feichen Shen ◽

Yiqing Zhao ◽

Liwei Wang ◽

Majid Rastegar Mojarad ◽

Yanshan Wang ◽

...

Keyword(s):

Rare Disease ◽

Data Driven ◽

Disease Knowledge ◽

Data Driven Approach ◽

Knowledge Enrichment

Download Full-text

Best Paper Selection

Yearbook of Medical Informatics ◽

10.1055/s-0040-1702011 ◽

2020 ◽

Vol 29 (01) ◽

pp. 167-168

Keyword(s):

Information Systems ◽

Electronic Health Record ◽

Rare Disease ◽

Data Driven ◽

Health Record ◽

Quality Changes ◽

Disease Knowledge ◽

Data Driven Approach ◽

Knowledge Enrichment ◽

Electronic Health

Burek P, Scherf N, Herre H. Ontology patterns for the representation of quality changes of cells in time. J Biomed Semantics 2019;10(1):16 https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-019-0206-4 Denaxas S, Gonzalez-Izquierdo A, Direk K, Fitzpatrick NK Fatemifar G, Banerjee A, Dobson RJB, Howe LJ, Kuan V, Lumbers RT, Pasea L, Patel RS, Shah AD, Hingorani AD, Sudlow C, Hemingway H. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J Am Med Inform Assoc 2019;26(12):1545-59 https://academic.oup.com/jamia/article/26/12/1545/5536916 Rector A, Schulz S, Rodrigues J-M, Chute CG, Solbrig H. On beyond Gruber: “Ontologies” in today's biomedical information systems and the limits of OWL. J Biomed Inform: X 2019 Jun 1;2:100002 https://academic.oup.com/jamia/article/26/12/1545/5536916 Shen F, Zhao Y, Wang L, Mojarad MR, Wang Y, Liu S, Liu H. Rare disease knowledge enrichment through a data-driven approach. BMC Med Inform Decis Mak 2019;19(1):32 https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-019-0752-9

Download Full-text

Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases

European Journal of Human Genetics ◽

10.1038/s41431-021-00859-0 ◽

2021 ◽

Author(s):

Birte Zurek ◽

◽

Kornelia Ellwanger ◽

Lisenka E. L. M. Vissers ◽

Rebecca Schüle ◽

...

Keyword(s):

Rare Disease ◽

Rare Diseases ◽

Disease Genes ◽

Minimum Requirement ◽

Horizon 2020 ◽

European Reference Networks ◽

Patient Representatives ◽

Reference Networks ◽

Collaborative Analysis ◽

First Time

AbstractFor the first time in Europe hundreds of rare disease (RD) experts team up to actively share and jointly analyse existing patient’s data. Solve-RD is a Horizon 2020-supported EU flagship project bringing together >300 clinicians, scientists, and patient representatives of 51 sites from 15 countries. Solve-RD is built upon a core group of four European Reference Networks (ERNs; ERN-ITHACA, ERN-RND, ERN-Euro NMD, ERN-GENTURIS) which annually see more than 270,000 RD patients with respective pathologies. The main ambition is to solve unsolved rare diseases for which a molecular cause is not yet known. This is achieved through an innovative clinical research environment that introduces novel ways to organise expertise and data. Two major approaches are being pursued (i) massive data re-analysis of >19,000 unsolved rare disease patients and (ii) novel combined -omics approaches. The minimum requirement to be eligible for the analysis activities is an inconclusive exome that can be shared with controlled access. The first preliminary data re-analysis has already diagnosed 255 cases form 8393 exomes/genome datasets. This unprecedented degree of collaboration focused on sharing of data and expertise shall identify many new disease genes and enable diagnosis of many so far undiagnosed patients from all over Europe.

Download Full-text

Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts

JAMIA Open ◽

10.1093/jamiaopen/ooab011 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Fuchiang R Tsui ◽

Lingyun Shi ◽

Victor Ruiz ◽

Neal D Ryan ◽

Candice Biernesser ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Suicide Attempt ◽

Language Processing ◽

Suicide Attempts ◽

Large Data ◽

Structured Data ◽

Data Driven ◽

Data Driven Approach ◽

First Time

Abstract Objective Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data. Methods This case-control study included patients aged 10–75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort). Results The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P < .001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922–0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history. Conclusions Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.

Download Full-text

Slinker: Visualising novel splicing events in RNA-Seq data

F1000Research ◽

10.12688/f1000research.74836.1 ◽

2021 ◽

Vol 10 ◽

pp. 1255

Author(s):

Breon Schmidt ◽

Marek Cmero ◽

Paul Ekert ◽

Nadia Davidson ◽

Alicia Oshlack

Keyword(s):

Rare Disease ◽

Human Genome ◽

Rna Sequencing ◽

Reference Genome ◽

Data Driven ◽

Rna Seq ◽

Bioinformatics Pipeline ◽

Link Type ◽

Muscular Disorders ◽

Data Driven Approach

Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, we have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks. Here we present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble sample-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease samples from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour samples in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions and is freely available on Github.

Download Full-text

From oscillation dip to oscillation valley in atmospheric neutrino experiments

The European Physical Journal C ◽

10.1140/epjc/s10052-021-08946-8 ◽

2021 ◽

Vol 81 (2) ◽

Cited By ~ 1

Author(s):

Anil Kumar ◽

Amina Khatun ◽

Sanjib Kumar Agarwalla ◽

Amol Dighe

Keyword(s):

Atmospheric Neutrino ◽

Simulated Data ◽

Data Driven ◽

Muon Energy ◽

Identification Algorithm ◽

Statistical Fluctuations ◽

Data Driven Approach ◽

Neutrino Experiments ◽

First Time ◽

Atmospheric Mass

AbstractAtmospheric neutrino experiments can show the “oscillation dip” feature in data, due to their sensitivity over a large L/E range. In experiments that can distinguish between neutrinos and antineutrinos, like INO, oscillation dips can be observed in both these channels separately. We present the dip-identification algorithm employing a data-driven approach – one that uses the asymmetry in the upward-going and downward-going events, binned in the reconstructed L/E of muons – to demonstrate the dip, which would confirm the oscillation hypothesis. We further propose, for the first time, the identification of an “oscillation valley” in the reconstructed ($$E_\mu $$ E μ ,$$\,\cos \theta _\mu $$ cos θ μ ) plane, feasible for detectors like ICAL having excellent muon energy and direction resolutions. We illustrate how this two-dimensional valley would offer a clear visual representation and test of the L/E dependence, the alignment of the valley quantifying the atmospheric mass-squared difference. Owing to the charge identification capability of the ICAL detector at INO, we always present our results using $$\mu ^{-}$$ μ - and $$\mu ^{+}$$ μ + events separately. Taking into account the statistical fluctuations and systematic errors, and varying oscillation parameters over their currently allowed ranges, we estimate the precision to which atmospheric neutrino oscillation parameters would be determined with the 10-year simulated data at ICAL using our procedure.

Download Full-text

Nationwide Comprehensive Epidemiological Study of Rare Diseases in Japan Using a Health Insurance Claims Database

10.21203/rs.3.rs-1199845/v1 ◽

2021 ◽

Author(s):

Kota Ninomiya ◽

Masahiro Okura

Keyword(s):

Health Insurance ◽

Natural History ◽

Rare Disease ◽

Rare Diseases ◽

Research Work ◽

Insurance Claims ◽

The West ◽

Claims Database ◽

History Of ◽

Health Insurance Claims

Abstract BackgroundMore than 7,000 diseases constitute what are called rare diseases, and they mostly have no specific treatment. Disease profiles, such as prevalence and natural history, among the population of a specific country are essential in determining for which disease to research and develop drugs. In Japan, disease profiles of fewer than 2,000 rare diseases, called Nanbyo, have been investigated. However, non-Nanbyo rare diseases remain largely uninvestigated. Accordingly, we reveal the prevalence and natural history of rare diseases among the Japanese population, using the National Database of Health Insurance Claims and Specific Health Checkups of Japan, which covered 99.9% of public health insurance claims from hospitals and 97.9% from clinics as of May 2015. Then, we compared them with the data reported in Orphanet. This cross-disease study is the first to analyze rare-disease epidemiology in Japan with high accuracy, disease coverage, and granularity.ResultsWe were provided with the number of patients of approximately 4,500 rare diseases by sex and age for 10 years with the permission of the Ministry of Health, Labour and Welfare. About 3,000 diseases have equivalent terms in Orphanet and other medical databases. The data show that even if the Nanbyo systems do not cover a rare disease, its patients survive in many cases. Moreover, regarding natural history, genetic diseases tend to be diagnosed later in Japan than they are in the West. The data collected for this research work are available in the supplement and the website of NanbyoData.ConclusionsOur research work revealed the basic epidemiology and the natural history of Japanese patients with rare diseases using a health insurance claims database. The results imply that the coverage of the present Nanbyo systems is inadequate for rare diseases. Therefore, fundamental reform might be needed to reduce unfairness between rare diseases. Moreover, most diseases in Japan follow a tendency similar to those reported in Orphanet. However, some diseases are detected later, partly because fewer clinical genetic tests are available in Japan than there are in the West. Finally, we hope that our data and analysis accelerate drug discovery for rare diseases in Japan.

Download Full-text

Mild cognitive impairment understanding: an empirical study by data-driven approach

BMC Bioinformatics ◽

10.1186/s12859-019-3057-1 ◽

2019 ◽

Vol 20 (S15) ◽

Author(s):

Liyuan Liu ◽

Bingchen Yu ◽

Meng Han ◽

Shanshan Yuan ◽

Na Wang

Keyword(s):

Risk Factors ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cognitive Decline ◽

Sleep Time ◽

Data Driven ◽

Control And Prevention ◽

Data Driven Approach ◽

Personal Welfare ◽

First Time

Abstract Background Cognitive decline has emerged as a significant threat to both public health and personal welfare, and mild cognitive decline/impairment (MCI) can further develop into Dementia/Alzheimer’s disease. While treatment of Dementia/Alzheimer’s disease can be expensive and ineffective sometimes, the prevention of MCI by identifying modifiable risk factors is a complementary and effective strategy. Results In this study, based on the data collected by Centers for Disease Control and Prevention (CDC) through the nationwide telephone survey, we apply a data-driven approach to re-exam the previously founded risk factors and discover new risk factors. We found that depression, physical health, cigarette usage, education level, and sleep time play an important role in cognitive decline, which is consistent with the previous discovery. Besides that, the first time, we point out that other factors such as arthritis, pulmonary disease, stroke, asthma, marital status also contribute to MCI risk, which is less exploited previously. We also incorporate some machine learning and deep learning algorithms to weigh the importance of various factors contributed to MCI and predicted cognitive declined. Conclusion By incorporating the data-driven approach, we can determine that risk factors significantly correlated with diseases. These correlations could also be expanded to another medical diagnosis besides MCI.

Download Full-text

Modeling Land Suitability for Vitis vinifera in Michigan Using Advanced Geospatial Data and Methods

Atmosphere ◽

10.3390/atmos11040339 ◽

2020 ◽

Vol 11 (4) ◽

pp. 339

Author(s):

Dan Wanyama ◽

Erin L. Bunting ◽

Robert Goodwin ◽

Nicholas Weil ◽

Paolo Sabbatini ◽

...

Keyword(s):

Vitis Vinifera ◽

Agricultural Production ◽

Agricultural Sector ◽

The State ◽

Land Suitability ◽

Data Driven ◽

Vitis Vinifera L ◽

Wine Grapes ◽

Data Driven Approach ◽

History Of

Michigan (MI) has a long history of diverse agricultural production. One of the most rapidly expanding and profitable agricultural crops, wine grapes (Vitis vinifera L.), has only been in cultivation across MI since the 1970s. As of 2014 more than 2100 acres of Vitis vinifera were growing statewide. With such success there is a push to rapidly develop more vinifera vineyards across the state. The industry is striving to have 10,000 acres in cultivation by 2024. This study presents a data-driven approach for guiding decision making to make this goal attainable. The study models land suitability across the state using environmental, climate, topographic and land use data to understand the most to least ideal portions of the landscape for vinifera establishment. The models are tested in 17 MI counties. The study found that land suitability for viticulture has expanded and therefore, viticulture can be extended beyond the traditional growing areas. This study suggests that warming temperatures have influenced land suitability and demonstrates the application and utility of GIS-based land suitability modeling in viticulture development. Maps produced in this study provide knowledge of the climate and environmental trends, which is critical when choosing where and what cultivar to grow. With such resources, growers can be better prepared to invest and expand this pivotal agricultural sector.

Download Full-text