Text Mining-Supported Information Extraction: An Extended Methodology for Developing Information Extraction Systems

Author(s):  
Christina Feilmayr
2015 ◽  
Vol 6 (4) ◽  
pp. 35-49 ◽  
Author(s):  
Laurent Issertial ◽  
Hiroshi Tsuji

This paper proposes a system called CFP Manager specialized on IT field and designed to ease the process of searching conference suitable to one's need. At present, the handling of CFP faces two problems: for emails, the huge quantity of CFP received can be easily skimmed through. For websites, the reviewing of some of the main CFP aggregators available online points out the lack of usable criteria. This system proposes to answer to these problems via its architecture consisting of three components: firstly an Information Extraction module extracting relevant information (as date, location, etc...) from CFP using rule based text mining algorithm. The second component enriches the now extracted data with external one from ontology models. Finally the last one displays the said data and allows the end user to perform complex queries on the CFP dataset and thus allow him to only access to CFP suitable for him. In order to validate the authors' proposal, they eventually process the well-known precision / recall metric on our information extraction component with an average of 0.95 for precision and 0.91 for recall on three different 100 CFP dataset. This paper finally discusses the validity of our approach by confronting our system for different queries with two systems already available online (WikiCFP and IEEE Conference Search) and basic text searching approach standing for searching in an email box. On a 100 CFP dataset with the wide variety of usable data and the possibility to perform complex queries we surpass basic text searching method and WikiCFP by not returning the false positive usually returned by them and find a result close to the IEEE system.


2021 ◽  
Author(s):  
tatsawan timakum ◽  
Min Song ◽  
Qing Xie

Abstract Background: E-mentalhealthcare is the convergence of digital technologies with mental health services. It has beendevelopedto fill a gap in healthcare for people who need mental wellbeing support and may never otherwise receive psychological treatment.This study aimed to apply text mining techniques to analyze the huge data of e-mental health researches and to report on research clusters and trends as well as the co-occurrence of biomedical and the use of information technology in this field.Methods: The e-mentalhealth research data was obtainedfrom 3,663 bibliographicrecords from Web of Science (WoS)and 3,172 full-text articlesfrom PubMed Central (PMC). The text mining techniques utilized for this study includedbibliometric analysis, information extraction, and visualization.Results: The e-mental health research topic trendsprimarily involvede-health care services and medical informatics research. The clusters of research comprise 16 clusters, which refer to mental sickness, ehealth, diseases, IT, and self-management. Based onthe information extraction analysis, in the biomedical domain, a “depression” entity was frequently detected and it pairs with other entities in the network with a betweenness centrality weighted at 0.046869 (eg. depression-online, depression-diabetes, depression-measure, and depression-mobile).The IT entity-relations of “mobile” were the most frequently found(weighted at 0.043466). The top pairs are related to depression, mobile health, and text message.Conclusions: E-mental health research trends focused on disease related-depression and using IT for treatment and prevention, primarily via online and mobile devices. Producing AI and machine learning are also being studied for e-mental healthcare. The results illustrate that physical sickness is likely to cause a mental health problem and identify the IT that was applied to help manage and mitigate mental health impacts.


2021 ◽  
Vol 3 ◽  
Author(s):  
Luke T. Slater ◽  
Andreas Karwath ◽  
Robert Hoehndorf ◽  
Georgios V. Gkoutos

Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.


2020 ◽  
Vol 10 (3) ◽  
pp. 35
Author(s):  
Ahmed Adeeb Jalal

Technology world has greatly evolved over the past decades, which led to inflated data volume. This progress of technology in the digital form generated scattered texts across millions of web pages. Unstructured texts contain a vast amount of textual data. Discover of useful and interesting relations from unstructured texts requires more processing by computers. Therefore, text mining and information extraction have become an exciting research field to get structured and valuable information. This paper focuses on text pre-processing of automotive advertisements domains to configure a structured database. The structured database was created by extract the information over unstructured automotive advertisements, which is an area of natural language processing. Information extraction deals with finding factual information in text using learning regular expressions. We manually craft rule-based specific approaches to extract structured information from unstructured web pages. Structured information will be provided by user-friendly search engine designed for topic-specific knowledge. Consequently, this information that extracted from these advertisements uses to perform a structured search over certain interesting attributes. Thus, the tuples are assigned a probability and indexed to support the efficiency of extraction and exploration via user queries.


Sign in / Sign up

Export Citation Format

Share Document