scholarly journals Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research

2014 ◽  
Author(s):  
Àlex Bravo ◽  
Janet Piñero ◽  
Núria Queralt ◽  
Michael Rautschka ◽  
Laura I. Furlong

Background Current biomedical research needs to leverage and exploit the large amount of information reported in publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. Results By exploiting morpho-syntactic information of the text BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. Conclusions BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources, raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

2018 ◽  
Author(s):  
George Karystianis ◽  
Armita Adily ◽  
Peter Schofield ◽  
Lee Knight ◽  
Clara Galdon ◽  
...  

BACKGROUND Vast numbers of domestic violence (DV) incidents are attended by the New South Wales Police Force each year in New South Wales and recorded as both structured quantitative data and unstructured free text in the WebCOPS (Web-based interface for the Computerised Operational Policing System) database regarding the details of the incident, the victim, and person of interest (POI). Although the structured data are used for reporting purposes, the free text remains untapped for DV reporting and surveillance purposes. OBJECTIVE In this paper, we explore whether text mining can automatically identify mental health disorders from this unstructured text. METHODS We used a training set of 200 DV recorded events to design a knowledge-driven approach based on lexical patterns in text suggesting mental health disorders for POIs and victims. RESULTS The precision returned from an evaluation set of 100 DV events was 97.5% and 87.1% for mental health disorders related to POIs and victims, respectively. After applying our approach to a large-scale corpus of almost a half million DV events, we identified 77,995 events (15.83%) that mentioned mental health disorders, with 76.96% (60,032/77,995) of those linked to POIs versus 16.47% (12,852/77,995) for the victims and 6.55% (5111/77,995) for both. Depression was the most common mental health disorder mentioned in both victims (22.30%, 3258) and POIs (18.73%, 8918), followed by alcohol abuse for POIs (12.24%, 5829) and various anxiety disorders (eg, panic disorder, generalized anxiety disorder) for victims (11.43%, 1671). CONCLUSIONS The results suggest that text mining can automatically extract targeted information from police-recorded DV events to support further public health research into the nexus between mental health disorders and DV.


2017 ◽  
Author(s):  
Halil Kilicoglu

AbstractAn estimated quarter of a trillion US dollars is invested in the biomedical research enterprise annually. There is growing alarm that a significant portion of this investment is wasted, due to problems in reproducibility of research findings and in the rigor and integrity of research conduct and reporting. Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research. Research activity is primarily communicated via textual artifacts, ranging from grant applications to journal publications. These artifacts can be both the source and the end result of practices leading to research waste. For example, an article may describe a poorly designed experiment, or the authors may reach conclusions not supported by the evidence presented. In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part towards enhancing research integrity and rigor. In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information overload, and accurate citation/enhanced bibliometrics. We review the existing methods and tools for specific tasks, if they exist, or discuss relevant research that can provide guidance for future work. With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can add checks and balances that promote responsible research practices and can provide significant benefits for the biomedical research enterprise.Supplementary informationSupplementary material is available at BioRxiv.


2011 ◽  
Vol 5 (4) ◽  
Author(s):  
K. K. Tan ◽  
A. S. Putra ◽  
L. P. Pham ◽  
T. H. Lee ◽  
M. Salto-Tellez ◽  
...  

Tissue micro array (TMA) is based on the idea of applying miniaturization and a high throughput approach to hybridization-based analyses of tissues. It facilitates biomedical research on a large scale in a single experiment; thus representing one of the most commonly used technologies in translational research. A critical analysis of the existing TMA instruments indicates that there are potential constraints in terms of portability, apart from costs and complexity. This paper will present the development of an affordable, configurable, and portable TMA instrument to allow an efficient collection of tissues, especially in instrument-to-tissue scenarios. The purely mechanical instrument requires no energy sources other than the user, is light weight, portable, and simple to use.


2017 ◽  
Author(s):  
Gabriel Rosenfeld ◽  
Dawei Lin

AbstractWhile the impact of biomedical research has traditionally been measured using bibliographic metrics such as citation or journal impact factor, the data itself is an output which can be directly measured to provide additional context about a publication’s impact. Data are a resource that can be repurposed and reused providing dividends on the original investment used to support the primary work. Moreover, it is the cornerstone upon which a tested hypothesis is rejected or accepted and specific scientific conclusions are reached. Understanding how and where it is being produced enhances the transparency and reproducibility of the biomedical research enterprise. Most biomedical data are not directly deposited in data repositories and are instead found in the publication within figures or attachments making it hard to measure. We attempted to address this challenge by using recent advances in word embedding to identify the technical and methodological features of terms used in the free text of articles’ methods sections. We created term usage signatures for five types of biomedical research data, which were used in univariate clustering to correctly identify a large fraction of positive control articles and a set of manually annotated articles where generation of data types could be validated. The approach was then used to estimate the fraction of PLOS articles generating each biomedical data type over time. Out of all PLOS articles analyzed (n = 129,918), ~7%, 19%, 12%, 18%, and 6% generated flow cytometry, immunoassay, genomic microarray, microscopy, and high-throughput sequencing data. The estimate portends a vast amount of biomedical data being produced: in 2016, if other publishers generated a similar amount of data then roughly 40,000 NIH-funded research articles would produce ~56,000 datasets consisting of the five data types we analyzed.One Sentence SummaryApplication of a word-embedding model trained on the methods sections of research articles allows for estimation of the production of diverse biomedical data types using text mining.


2020 ◽  
Vol 15 (7) ◽  
pp. 750-757
Author(s):  
Jihong Wang ◽  
Yue Shi ◽  
Xiaodan Wang ◽  
Huiyou Chang

Background: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.


2020 ◽  
Author(s):  
Amir Karami ◽  
Brandon Bookstaver ◽  
Melissa Nolan

BACKGROUND The COVID-19 pandemic has impacted nearly all aspects of life and has posed significant threats to international health and the economy. Given the rapidly unfolding nature of the current pandemic, there is an urgent need to streamline literature synthesis of the growing scientific research to elucidate targeted solutions. While traditional systematic literature review studies provide valuable insights, these studies have restrictions, including analyzing a limited number of papers, having various biases, being time-consuming and labor-intensive, focusing on a few topics, incapable of trend analysis, and lack of data-driven tools. OBJECTIVE This study fills the mentioned restrictions in the literature and practice by analyzing two biomedical concepts, clinical manifestations of disease and therapeutic chemical compounds, with text mining methods in a corpus containing COVID-19 research papers and find associations between the two biomedical concepts. METHODS This research has collected papers representing COVID-19 pre-prints and peer-reviewed research published in 2020. We used frequency analysis to find highly frequent manifestations and therapeutic chemicals, representing the importance of the two biomedical concepts. This study also applied topic modeling to find the relationship between the two biomedical concepts. RESULTS We analyzed 9,298 research papers published through May 5, 2020 and found 3,645 disease-related and 2,434 chemical-related articles. The most frequent clinical manifestations of disease terminology included COVID-19, SARS, cancer, pneumonia, fever, and cough. The most frequent chemical-related terminology included Lopinavir, Ritonavir, Oxygen, Chloroquine, Remdesivir, and water. Topic modeling provided 25 categories showing relationships between our two overarching categories. These categories represent statistically significant associations between multiple aspects of each category, some connections of which were novel and not previously identified by the scientific community. CONCLUSIONS Appreciation of this context is vital due to the lack of a systematic large-scale literature review survey and the importance of fast literature review during the current COVID-19 pandemic for developing treatments. This study is beneficial to researchers for obtaining a macro-level picture of literature, to educators for knowing the scope of literature, to journals for exploring most discussed disease symptoms and pharmaceutical targets, and to policymakers and funding agencies for creating scientific strategic plans regarding COVID-19.


Cancers ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 2111
Author(s):  
Bo-Wei Zhao ◽  
Zhu-Hong You ◽  
Lun Hu ◽  
Zhen-Hao Guo ◽  
Lei Wang ◽  
...  

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Masahiro Inoue ◽  
Shota Arichi ◽  
Tsuyoshi Hachiya ◽  
Anna Ohtera ◽  
Seok-Won Kim ◽  
...  

Abstract Objective In order to assess the applicability of a direct-to-consumer (DTC) genetic testing to translational research for obtaining new knowledge on relationships between drug target genes and diseases, we examined possibility of these data by associating SNPs and disease related phenotype information collected from healthy individuals. Results A total of 12,598 saliva samples were collected from the customers of commercial service for SNPs analysis and web survey were conducted to collect phenotype information. The collected dataset revealed similarity to the Japanese data but distinguished differences to other populations of all dataset of the 1000 Genomes Project. After confirmation of a well-known relationship between ALDH2 and alcohol-sensitivity, Phenome-Wide Association Study (PheWAS) was performed to find association between pre-selected drug target genes and all the phenotypes. Association was found between GRIN2B and multiple phenotypes related to depression, which is considered reliable based on previous reports on the biological function of GRIN2B protein and its relationship with depression. These results suggest possibility of using SNPs and phenotype information collected from healthy individuals as a translational research tool for drug discovery to find relationship between a gene and a disease if it is possible to extract individuals in pre-disease states by properly designed questionnaire.


2021 ◽  
Vol 13 (10) ◽  
pp. 5717
Author(s):  
Mian Muhammad-Ahson Aslam ◽  
Hsion-Wen Kuo ◽  
Walter Den ◽  
Muhammad Usman ◽  
Muhammad Sultan ◽  
...  

As the world human population and industrialization keep growing, the water availability issue has forced scientists, engineers, and legislators of water supply industries to better manage water resources. Pollutant removals from wastewaters are crucial to ensure qualities of available water resources (including natural water bodies or reclaimed waters). Diverse techniques have been developed to deal with water quality concerns. Carbon based nanomaterials, especially carbon nanotubes (CNTs) with their high specific surface area and associated adsorption sites, have drawn a special focus in environmental applications, especially water and wastewater treatment. This critical review summarizes recent developments and adsorption behaviors of CNTs used to remove organics or heavy metal ions from contaminated waters via adsorption and inactivation of biological species associated with CNTs. Foci include CNTs synthesis, purification, and surface modifications or functionalization, followed by their characterization methods and the effect of water chemistry on adsorption capacities and removal mechanisms. Functionalized CNTs have been proven to be promising nanomaterials for the decontamination of waters due to their high adsorption capacity. However, most of the functional CNT applications are limited to lab-scale experiments only. Feasibility of their large-scale/industrial applications with cost-effective ways of synthesis and assessments of their toxicity with better simulating adsorption mechanisms still need to be studied.


2021 ◽  
pp. 089801012110627
Author(s):  
Elizabeth Kinchen

The purpose of this quantitative, descriptive, exploratory study was to gauge the degree to which nurse practitioners (NPs) incorporate holistic nursing values in their care, with a special focus on shared decision-making (SDM), using the Nurse Practitioner Holistic Caring Instrument (NPHCI), an investigator-developed scale. A single open-ended question inviting free-text comment was also included, soliciting participants’ views on the holistic attributes of their care. A convenience sample of NPs ( n = 573) was recruited from a southeastern U.S. state Board of Nursing's (BON) publicly available list of licensed NPs. Results suggest that NPs do indeed perceive their care to be holistic, and that they routinely incorporate elements of SDM in their care. Highest scores were accorded to listening, taking time to talk to patients, knowledge of physical condition, soliciting patient input in care decisions, considering how other areas of a patient's life may affect their medical condition, and attention to “what matters most” to the patient. Age, gender, level of education, practice specialty, and location were also associated with inclusion of holistic care. Free-text responses revealed that NPs value holistic care and desire to practice holistically, but identify “lack of time” to incorporate or practice holistic care as a barrier.


Sign in / Sign up

Export Citation Format

Share Document