scholarly journals Toward Preparing a Knowledge Base to Explore Potential Drugs and Biomedical Entities Related to COVID-19: Automated Computational Approach (Preprint)

2020 ◽  
Author(s):  
Junaed Younus Khan ◽  
Md Tawkat Islam Khondaker ◽  
Iram Tazim Hoque ◽  
Hamada R H Al-Absi ◽  
Mohammad Saifur Rahman ◽  
...  

BACKGROUND Novel coronavirus disease 2019 (COVID-19) is taking a huge toll on public health. Along with the non-therapeutic preventive measurements, scientific efforts are currently focused, mainly, on the development of vaccines and pharmacological treatment with existing drugs. Summarizing evidences from scientific literatures on the discovery of treatment plan of COVID-19 under a platform would help the scientific community to explore the opportunities in a systematic fashion. OBJECTIVE The aim of this study is to explore the potential drugs and biomedical entities related to coronavirus related diseases, including COVID-19, that are mentioned on scientific literature through an automated computational approach. METHODS We mined the information from publicly available scientific literature and related public resources. Six topic-specific dictionaries, including human genes, human miRNAs, diseases, Protein Databank, drugs, and drug side effects, were integrated to mine all scientific evidence related to COVID-19. We employed an automated literature mining and labeling system through a novel approach to measure the effectiveness of drugs against diseases based on natural language processing, sentiment analysis, and deep learning. We also applied the concept of cosine similarity to confidently infer the associations between diseases and genes. RESULTS Based on the literature mining, we identified 1805 diseases, 2454 drugs, 1910 genes that are related to coronavirus related diseases including COVID-19. Integrating the extracted information, we developed the first knowledgebase platform dedicated to COVID-19, which highlights potential list of drugs and related biomedical entities. For COVID-19, we highlighted multiple case studies on existing drugs along with a confidence score for their applicability in the treatment plan. Based on our computational method, we found Remdesivir, Statins, Dexamethasone, and Ivermectin could be considered as potential effective drugs to improve clinical status and lower mortality in patients hospitalized with COVID-19. We also found that Hydroxychloroquine could not be considered as an effective drug for COVID-19. The resulting knowledgebase is made available as an open source tool, named COVID-19Base. CONCLUSIONS Proper investigation of the mined biomedical entities along with the identified interactions among those would help the research community to discover possible ways for the therapeutic treatment of COVID-19.

10.2196/21648 ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. e21648
Author(s):  
Junaed Younus Khan ◽  
Md Tawkat Islam Khondaker ◽  
Iram Tazim Hoque ◽  
Hamada R H Al-Absi ◽  
Mohammad Saifur Rahman ◽  
...  

Background Novel coronavirus disease 2019 (COVID-19) is taking a huge toll on public health. Along with the non-therapeutic preventive measurements, scientific efforts are currently focused, mainly, on the development of vaccines and pharmacological treatment with existing drugs. Summarizing evidences from scientific literatures on the discovery of treatment plan of COVID-19 under a platform would help the scientific community to explore the opportunities in a systematic fashion. Objective The aim of this study is to explore the potential drugs and biomedical entities related to coronavirus related diseases, including COVID-19, that are mentioned on scientific literature through an automated computational approach. Methods We mined the information from publicly available scientific literature and related public resources. Six topic-specific dictionaries, including human genes, human miRNAs, diseases, Protein Databank, drugs, and drug side effects, were integrated to mine all scientific evidence related to COVID-19. We employed an automated literature mining and labeling system through a novel approach to measure the effectiveness of drugs against diseases based on natural language processing, sentiment analysis, and deep learning. We also applied the concept of cosine similarity to confidently infer the associations between diseases and genes. Results Based on the literature mining, we identified 1805 diseases, 2454 drugs, 1910 genes that are related to coronavirus related diseases including COVID-19. Integrating the extracted information, we developed the first knowledgebase platform dedicated to COVID-19, which highlights potential list of drugs and related biomedical entities. For COVID-19, we highlighted multiple case studies on existing drugs along with a confidence score for their applicability in the treatment plan. Based on our computational method, we found Remdesivir, Statins, Dexamethasone, and Ivermectin could be considered as potential effective drugs to improve clinical status and lower mortality in patients hospitalized with COVID-19. We also found that Hydroxychloroquine could not be considered as an effective drug for COVID-19. The resulting knowledgebase is made available as an open source tool, named COVID-19Base. Conclusions Proper investigation of the mined biomedical entities along with the identified interactions among those would help the research community to discover possible ways for the therapeutic treatment of COVID-19.


2019 ◽  
Vol 20 (2) ◽  
pp. 302 ◽  
Author(s):  
Jingzhong Gan ◽  
Jie Qiu ◽  
Canshang Deng ◽  
Wei Lan ◽  
Qingfeng Chen ◽  
...  

Protein phosphorylation is an important chemical modification catalyzed by kinases. It plays important roles in many cellular processes. Predicting kinase–substrate interactions is vital to understanding the mechanism of many diseases. Many computational methods have been proposed to identify kinase–substrate interactions. However, the prediction accuracy still needs to be improved. Therefore, it is necessary to develop an efficient computational method to predict kinase–substrate interactions. In this paper, we propose a novel computational approach, KSIMC, to identify kinase–substrate interactions based on matrix completion. Firstly, the kinase similarity and substrate similarity are calculated by aligning sequence of kinase–kinase and substrate–substrate, respectively. Then, the original association network is adjusted based on the similarities. Finally, the matrix completion is used to predict potential kinase–substrate interactions. The experiment results show that our method outperforms other state-of-the-art algorithms in performance. Furthermore, the relevant databases and scientific literature verify the effectiveness of our algorithm for new kinase–substrate interaction identification.


2021 ◽  
Vol 6 (2) ◽  
pp. 106-112
Author(s):  
Arif Himawan ◽  
Muhammad Rifqi Maarif ◽  
Ulfi Saidata Aesyi

The main purpose of this paper is to examine the dominant topics about Business Intelligence in micro-blogging Twitter. There are 7.153 tweets collected from Twitter API. Text mining and natural language processing are used to analyze the dominant topics among those tweets. Computational method used to count the most frequent hashtag that appears together with Business Intelligence hashtag. Twitter users are large and scattered around the world with a diverse range of skills (expertise) that can give a new perspective on a subject that may not be predicted before. For example, for topics related to Business Intelligence, the very dominant general topic discussed in the scientific literature are about data management, as well as for analytics and machine learning data. The result contributes to understanding dominant topics about Business Intelligence that can help researchers to level their research.


2018 ◽  
Author(s):  
Stephen Joseph Wilson ◽  
Angela Dawn Wilkins ◽  
Matthew V. Holt ◽  
Byung Kwon Choi ◽  
Daniel Konecki ◽  
...  

ABSTRACTThe scientific literature is vast, growing, and increasingly specialized, making it difficult to connect disparate observations across subfields. To address this problem, we sought to develop automated hypothesis generation by networking at scale the MeSH terms curated by the National Library of Medicine. The result is a Mesh Term Objective Reasoning (MeTeOR) approach that tallies associations among genes, drugs and diseases from PubMed and predicts new ones.Comparisons to reference databases and algorithms show MeTeOR tends to be more reliable. We also show that many predictions based on the literature prior to 2014 were published subsequently. In a practical application, we validated experimentally a surprising new association found by MeTeOR between novel Epidermal Growth Factor Receptor (EGFR) associations and CDK2. We conclude that MeTeOR generates useful hypotheses from the literature (http://meteor.lichtargelab.org/).AUTHOR SUMMARYThe large size and exponential expansion of the scientific literature forms a bottleneck to accessing and understanding published findings. Manual curation and Natural Language Processing (NLP) aim to address this bottleneck by summarizing and disseminating the knowledge within articles as key relationships (e.g. TP53 relates to Cancer). However, these methods compromise on either coverage or accuracy, respectively. To mitigate this compromise, we proposed using manually-assigned keywords (MeSH terms) to extract relationships from the publications and demonstrated a comparable coverage but higher accuracy than current NLP methods. Furthermore, we combined the extracted knowledge with semi-supervised machine learning to create hypotheses to guide future work and discovered a direct interaction between two important cancer genes.


2020 ◽  
Vol 11 (1) ◽  
pp. 24
Author(s):  
Jin Tao ◽  
Kelly Brayton ◽  
Shira Broschat

Advances in genome sequencing technology and computing power have brought about the explosive growth of sequenced genomes in public repositories with a concomitant increase in annotation errors. Many protein sequences are annotated using computational analysis rather than experimental verification, leading to inaccuracies in annotation. Confirmation of existing protein annotations is urgently needed before misannotation becomes even more prevalent due to error propagation. In this work we present a novel approach for automatically confirming the existence of manually curated information with experimental evidence of protein annotation. Our ensemble learning method uses a combination of recurrent convolutional neural network, logistic regression, and support vector machine models. Natural language processing in the form of word embeddings is used with journal publication titles retrieved from the UniProtKB database. Importantly, we use recall as our most significant metric to ensure the maximum number of verifications possible; results are reported to a human curator for confirmation. Our ensemble model achieves 91.25% recall, 71.26% accuracy, 65.19% precision, and an F1 score of 76.05% and outperforms the Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT) model with fine-tuning using the same data.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Chris Bauer ◽  
Ralf Herwig ◽  
Matthias Lienhard ◽  
Paul Prasse ◽  
Tobias Scheffer ◽  
...  

Abstract Background There is a huge body of scientific literature describing the relation between tumor types and anti-cancer drugs. The vast amount of scientific literature makes it impossible for researchers and physicians to extract all relevant information manually. Methods In order to cope with the large amount of literature we applied an automated text mining approach to assess the relations between 30 most frequent cancer types and 270 anti-cancer drugs. We applied two different approaches, a classical text mining based on named entity recognition and an AI-based approach employing word embeddings. The consistency of literature mining results was validated with 3 independent methods: first, using data from FDA approvals, second, using experimentally measured IC-50 cell line data and third, using clinical patient survival data. Results We demonstrated that the automated text mining was able to successfully assess the relation between cancer types and anti-cancer drugs. All validation methods showed a good correspondence between the results from literature mining and independent confirmatory approaches. The relation between most frequent cancer types and drugs employed for their treatment were visualized in a large heatmap. All results are accessible in an interactive web-based knowledge base using the following link: https://knowledgebase.microdiscovery.de/heatmap. Conclusions Our approach is able to assess the relations between compounds and cancer types in an automated manner. Both, cancer types and compounds could be grouped into different clusters. Researchers can use the interactive knowledge base to inspect the presented results and follow their own research questions, for example the identification of novel indication areas for known drugs.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Bo Sun ◽  
Fei Zhang ◽  
Jing Li ◽  
Yicheng Yang ◽  
Xiaolin Diao ◽  
...  

Abstract Background With the development and application of medical information system, semantic interoperability is essential for accurate and advanced health-related computing and electronic health record (EHR) information sharing. The openEHR approach can improve semantic interoperability. One key improvement of openEHR is that it allows for the use of existing archetypes. The crucial problem is how to improve the precision and resolve ambiguity in the archetype retrieval. Method Based on the query expansion technology and Word2Vec model in Nature Language Processing (NLP), we propose to find synonyms as substitutes for original search terms in archetype retrieval. Test sets in different medical professional level are used to verify the feasibility. Result Applying the approach to each original search term (n = 120) in test sets, a total of 69,348 substitutes were constructed. Precision at 5 (P@5) was improved by 0.767, on average. For the best result, the P@5 was up to 0.975. Conclusions We introduce a novel approach that using NLP technology and corpus to find synonyms as substitutes for original search terms. Compared to simply mapping the element contained in openEHR to an external dictionary, this approach could greatly improve precision and resolve ambiguity in retrieval tasks. This is helpful to promote the application of openEHR and advance EHR information sharing.


2020 ◽  
Vol 19 (3) ◽  
pp. 178
Author(s):  
Jefferson Petto ◽  
Igor Macedo De Oliveira ◽  
Alice Miranda De Oliveira ◽  
Marvyn De Santana Do Sacramento

The earliest accounts of scientific thought date back to thousands of years BC, where problems in the daily lives of our predecessors led to the search for effective and replicable forms of resolution. Nowadays, in the advent of science and technology, health professionals' decision making has been organized based on the analysis of the diverse evidence available in the scientific literature. This process has been identified Evidence Based Practice (EBP)...


Author(s):  
Yoosin Kim ◽  
Michelle Jeong ◽  
Seung Ryul Jeong

In light of recent research that has begun to examine the link between textual “big data” and social phenomena such as stock price increases, this chapter takes a novel approach to treating news as big data by proposing the intelligent investment decision-making support model based on opinion mining. In an initial prototype experiment, the researchers first built a stock domain-specific sentiment dictionary via natural language processing of online news articles and calculated sentiment scores for the opinions extracted from those stories. In a separate main experiment, the researchers gathered 78,216 online news articles from two different media sources to not only make predictions of actual stock price increases but also to compare the predictive accuracy of articles from different media sources. The study found that opinions that are extracted from the news and treated with proper sentiment analysis can be effective in predicting changes in the stock market.


Semantic Web technology is not new as most of us contemplate; it has evolved over the years. Linked Data web terminology is the name set recently to the Semantic Web. Semantic Web is a continuation of Web 2.0 and it is to replace existing technologies. It is built on Natural Language processing and provides solutions to most of the prevailing issues. Web 3.0 is the version of Semantic Web caters to the information needs of half of the population on earth. This paper links two important current concerns, the security of information and enforced online education due to COVID-19 with Semantic Web. The Steganography requirement for the Semantic web is discussed elaborately, even though encryption is applied which is inadequate in providing protection. Web 2.0 issues concerning online education and semantic Web solutions have been discussed. An extensive literature survey has been conducted related to the architecture of Web 3.0, detailed history of online education, and Security architecture. Finally, Semantic Web is here to stay and data hiding along with encryption makes it robust.


Sign in / Sign up

Export Citation Format

Share Document