scholarly journals Integration of Neuroimaging and Microarray Datasets through Mapping and Model-Theoretic semantic Decomposition of Unstructured phenotypes

2009 ◽  
Vol 8 ◽  
pp. CIN.S1046
Author(s):  
Spiro P. Pantazatos ◽  
Jianrong Li ◽  
Paul Pavlidis ◽  
Yves A. Lussier

An approach towards heterogeneous neuroscience dataset integration is proposed that uses Natural Language Processing (NLP) and a knowledge-based phenotype organizer system (PhenOS) to link ontology-anchored terms to underlying data from each database, and then maps these terms based on a computable model of disease (SNOMED CT®). The approach was implemented using sample datasets from fMRIDC, GEO, The Whole Brain Atlas and Neuronames, and allowed for complex queries such as “List all disorders with a finding site of brain region X, and then find the semantically related references in all participating databases based on the ontological model of the disease or its anatomical and morphological attributes”. Precision of the NLP-derived coding of the unstructured phenotypes in each dataset was 88% (n = 50), and precision of the semantic mapping between these terms across datasets was 98% (n = 100). To our knowledge, this is the first example of the use of both semantic decomposition of disease relationships and hierarchical information found in ontologies to integrate heterogeneous phenotypes across clinical and molecular datasets.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pilar López-Úbeda ◽  
Alexandra Pomares-Quimbaya ◽  
Manuel Carlos Díaz-Galiano ◽  
Stefan Schulz

Abstract Background Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. Results This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. Conclusion The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


2021 ◽  
Author(s):  
Marciane Mueller ◽  
Rejane Frozza ◽  
Liane Mählmann Kipper ◽  
Ana Carolina Kessler

BACKGROUND This article presents the modeling and development of a Knowledge Based System, supported by the use of a virtual conversational agent called Dóris. Using natural language processing resources, Dóris collects the clinical data of patients in care in the context of urgency and hospital emergency. OBJECTIVE The main objective is to validate the use of virtual conversational agents to properly and accurately collect the data necessary to perform the evaluation flowcharts used to classify the degree of urgency of patients and determine the priority for medical care. METHODS The agent's knowledge base was modeled using the rules provided for in the evaluation flowcharts comprised by the Manchester Triage System. It also allows the establishment of a simple, objective and complete communication, through dialogues to assess signs and symptoms that obey the criteria established by a standardized, validated and internationally recognized system. RESULTS Thus, in addition to verifying the applicability of Artificial Intelligence techniques in a complex domain of health care, a tool is presented that helps not only in the perspective of improving organizational processes, but also in improving human relationships, bringing professionals and patients closer. The system's knowledge base was modeled on the IBM Watson platform. CONCLUSIONS The results obtained from simulations carried out by the human specialist allowed us to verify that a knowledge-based system supported by a virtual conversational agent is feasible for the domain of risk classification and priority determination of medical care for patients in the context of emergency care and hospital emergency.


2018 ◽  
Vol 7 (3.33) ◽  
pp. 168
Author(s):  
Yonglak SHON ◽  
Jaeyoung PARK ◽  
Jangmook KANG ◽  
Sangwon LEE

The LOD data sets consist of RDF Triples based on the Ontology, a specification of existing facts, and by linking them to previously disclosed knowledge based on linked data principles. These structured LOD clouds form a large global data network, which provides a more accurate foundation for users to deliver the desired information. However, it is difficult to identify that, if the presence of the same object is identified differently across several LOD data sets, they are inherently identical. This is because objects with different URIs in the LOD datasets must be different and they must be closely examined for similarities in order to judge them as identical. The aim of this study is that the prosed model, RILE, evaluates similarity by comparing object values of existing specified predicates. After performing experiments with our model, we could check the improvement of the confidence level of the connection by extracting the link value.  


2015 ◽  
Vol 12 (2) ◽  
pp. 432-443 ◽  
Author(s):  
Luis Riazuelo ◽  
Moritz Tenorth ◽  
Daniel Di Marco ◽  
Marta Salas ◽  
Dorian Galvez-Lopez ◽  
...  

Author(s):  
Saravanakumar Kandasamy ◽  
Aswani Kumar Cherukuri

Semantic similarity quantification between concepts is one of the inevitable parts in domains like Natural Language Processing, Information Retrieval, Question Answering, etc. to understand the text and their relationships better. Last few decades, many measures have been proposed by incorporating various corpus-based and knowledge-based resources. WordNet and Wikipedia are two of the Knowledge-based resources. The contribution of WordNet in the above said domain is enormous due to its richness in defining a word and all of its relationship with others. In this paper, we proposed an approach to quantify the similarity between concepts that exploits the synsets and the gloss definitions of different concepts using WordNet. Our method considers the gloss definitions, contextual words that are helping in defining a word, synsets of contextual word and the confidence of occurrence of a word in other word’s definition for calculating the similarity. The evaluation based on different gold standard benchmark datasets shows the efficiency of our system in comparison with other existing taxonomical and definitional measures.


Author(s):  
Hayden Wimmer ◽  
Roy Rada

Artificial intelligence techniques have long been applied to financial investing scenarios to determine market inefficiencies, criteria for credit scoring, and bankruptcy prediction, to name a few. While there are many subfields to artificial intelligence this work seeks to identify the most commonly applied AI techniques to financial investing as appears in academic literature. AI techniques, such as knowledge-based, machine learning, and natural language processing, are integrated into systems that simultaneously address data identification, asset valuation, and risk management. Future trends will continue to integrate hybrid artificial intelligence techniques into financial investing, portfolio optimization, and risk management. The remainder of this article summarizes key contributions of applying AI to financial investing as appears in the academic literature.


Author(s):  
Azleena Mohd Kassim ◽  
Yu-N Cheah

Information Technology (IT) is often employed to put knowledge management policies into operation. However, many of these tools require human intervention when it comes to deciding how the knowledge is to be managed. The Sematic Web may be an answer to this issue, but many Sematic Web tools are not readily available for the regular IT user. Another problem that arises is that typical efforts to apply or reuse knowledge via a search mechanism do not necessarily link to other pages that are relevant. Blogging systems appear to address some of these challenges but the browsing experience can be further enhanced by providing links to other relevant posts. In this chapter, the authors present a semantic blogging tool called SEMblog to identify, organize, and reuse knowledge based on the Sematic Web and ontologies. The SEMblog methodology brings together technologies such as Natural Language Processing (NLP), Sematic Web representations, and the ubiquity of the blogging environment to produce a more intuitive way to manage knowledge, especially in the areas of knowledge identification, organization, and reuse. Based on detailed comparisons with other similar systems, the uniqueness of SEMblog lies in its ability to automatically generate keywords and semantic links.


2012 ◽  
pp. 1215-1236 ◽  
Author(s):  
Farid Meziane ◽  
Sunil Vadera

Artificial intelligences techniques such as knowledge based systems, neural networks, fuzzy logic and data mining have been advocated by many researchers and developers as the way to improve many of the software development activities. As with many other disciplines, software development quality improves with the experience, knowledge of the developers, past projects and expertise. Software also evolves as it operates in changing and volatile environments. Hence, there is significant potential for using AI for improving all phases of the software development life cycle. This chapter provides a survey on the use of AI for software engineering that covers the main software development phases and AI methods such as natural language processing techniques, neural networks, genetic algorithms, fuzzy logic, ant colony optimization, and planning methods.


Author(s):  
Farid Meziane ◽  
Sunil Vadera

Artificial intelligences techniques such as knowledge based systems, neural networks, fuzzy logic and data mining have been advocated by many researchers and developers as the way to improve many of the software development activities. As with many other disciplines, software development quality improves with the experience, knowledge of the developers, past projects and expertise. Software also evolves as it operates in changing and volatile environments. Hence, there is significant potential for using AI for improving all phases of the software development life cycle. This chapter provides a survey on the use of AI for software engineering that covers the main software development phases and AI methods such as natural language processing techniques, neural networks, genetic algorithms, fuzzy logic, ant colony optimization, and planning methods.


Sign in / Sign up

Export Citation Format

Share Document