Mining Semantic Structures from Syntactic Structures in Free Text Documents

Expanded Semantic Graph Representation for Matching Related Information of Interest across Free Text Documents

2012 IEEE Sixth International Conference on Semantic Computing ◽

10.1109/icsc.2012.45 ◽

2012 ◽

Cited By ~ 2

Author(s):

James R. Johnson ◽

Anita Miller ◽

Latifur Khan ◽

Bhavani Thuraisingham

Keyword(s):

Graph Representation ◽

Free Text ◽

Text Documents ◽

Semantic Graph ◽

Related Information

Download Full-text

Representation of Information about Family Relatives as Structured Data in Electronic Health Records

Applied Clinical Informatics ◽

10.4338/aci-2013-10-ra-0080 ◽

2014 ◽

Vol 05 (02) ◽

pp. 349-367 ◽

Cited By ~ 13

Author(s):

Y. Lu ◽

C.J. Vitale ◽

P.L. Mar ◽

F. Chang ◽

N. Dhopeshwarkar ◽

...

Keyword(s):

Family History ◽

Structured Data ◽

Free Text ◽

Snomed Ct ◽

Family History Information ◽

Text Documents ◽

Health Records ◽

Relative Information ◽

History Information ◽

Electronic Health

SummaryBackground: The ability to manage and leverage family history information in the electronic health record (EHR) is crucial to delivering high-quality clinical care.Objectives: We aimed to evaluate existing standards in representing relative information, examine this information documented in EHRs, and develop a natural language processing (NLP) application to extract relative information from free-text clinical documents.Methods: We reviewed a random sample of 100 admission notes and 100 discharge summaries of 198 patients, and also reviewed the structured entries for these patients in an EHR system’s family history module. We investigated the two standards used by Stage 2 of Meaningful Use (SNOMED CT and HL7 Family History Standard) and identified coverage gaps of each standard in coding relative information. Finally, we evaluated the performance of the MTERMS NLP system in identifying relative information from free-text documents.Results: The structure and content of SNOMED CT and HL7 for representing relative information are different in several ways. Both terminologies have high coverage to represent local relative concepts built in an ambulatory EHR system, but gaps in key concept coverage were detected; coverage rates for relative information in free-text clinical documents were 95.2% and 98.6%, respectively. Compared to structured entries, richer family history information was only available in free-text documents. Using a comprehensive lexicon that included concepts and terms of relative information from different sources, we expanded the MTERMS NLP system to extract and encode relative information in clinical documents and achieved a corresponding precision of 100% and recall of 97.4%.Conclusions: Comprehensive assessment and user guidance are critical to adopting standards into EHR systems in a meaningful way. A significant portion of patients’ family history information is only documented in free-text clinical documents and NLP can be used to extract this information.Citation: Zhou L, Lu Y, Vitale CJ, Mar PL, Chang F, Dhopeshwarkar N, Rocha RA. Representation of information about family relatives as structured data in electronic health records. Appl Clin Inf 2014; 5: 349–367 http://dx.doi.org/10.4338/ACI-2013-10-RA-0080

Download Full-text

A Comprehensive Framework Towards Information Sharing Between Government Agencies

E-Government Diffusion, Policy, and Impact ◽

10.4018/978-1-60566-130-8.ch004 ◽

2009 ◽

pp. 43-59

Author(s):

Akhilesh Bajaj ◽

Sudha Ram

Keyword(s):

Information Sharing ◽

Data Storage ◽

Heterogeneous Data ◽

Information Storage ◽

Government Agencies ◽

Free Text ◽

Text Documents ◽

Shared Information ◽

Share Data ◽

Reducing Costs

Recently, there has been increased interest in sharing digitized information between government agencies, with the goals of improving security, reducing costs, and offering better quality service to users of government services. The bulk of previous work in interagency information sharing has focused largely on the sharing of structured information among heterogeneous data sources, whereas government agencies need to share data with varying degrees of structure ranging from free text documents to relational data. In this work, we explore the different technologies available to share information. Specifically, our framework discusses the optional data storage mechanisms required to support a Service Oriented Architecture (SOA). We compare XML document, free text search engine, and relational database technologies and analyze the pros and cons of each approach. We explore these options along the dimensions of information definition, information storage, the access to this information, and finally the maintenance of shared information.

Download Full-text

Mining Free Text for Structure

Data Mining ◽

10.4018/978-1-59140-051-6.ch012 ◽

2011 ◽

pp. 278-300

Author(s):

Vladimir A. Kulyukin ◽

Robin Burke

Keyword(s):

Structural Organization ◽

Knowledge Engineering ◽

Knowledge Bases ◽

Free Text ◽

Learning Approaches ◽

Structural Components ◽

Text Documents ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Ultimate Objective

Knowledge of the structural organization of information in documents can be of significant assistance to information systems that use documents as their knowledge bases. In particular, such knowledge is of use to information retrieval systems that retrieve documents in response to user queries. This chapter presents an approach to mining free-text documents for structure that is qualitative in nature. It complements the statistical and machine-learning approaches, insomuch as the structural organization of information in documents is discovered through mining free text for content markers left behind by document writers. The ultimate objective is to find scalable data mining (DM) solutions for free-text documents in exchange for modest knowledge-engineering requirements. The problem of mining free text for structure is addressed in the context of finding structural components of files of frequently asked questions (FAQs) associated with many USENET newsgroups. The chapter describes a system that mines FAQs for structural components. The chapter concludes with an outline of possible future trends in the structural mining of free text.

Download Full-text

A Comprehensive Framework Towards Information Sharing Between Government Agencies

Database Technologies ◽

10.4018/978-1-60566-058-5.ch104 ◽

2009 ◽

pp. 1723-1740

Author(s):

Akhilesh Bajaj ◽

Sudha Ram

Keyword(s):

Information Sharing ◽

Data Storage ◽

Heterogeneous Data ◽

Information Storage ◽

Government Agencies ◽

Free Text ◽

Text Documents ◽

Shared Information ◽

Share Data ◽

Reducing Costs

Recently, there has been increased interest in sharing digitized information between government agencies, with the goals of improving security, reducing costs, and offering better quality service to users of government services. The bulk of previous work in interagency information sharing has focused largely on the sharing of structured information among heterogeneous data sources, whereas government agencies need to share data with varying degrees of structure ranging from free text documents to relational data. In this work, we explore the different technologies available to share information. Specifically, our framework discusses the optional data storage mechanisms required to support a Service Oriented Architecture (SOA). We compare XML document, free text search engine, and relational database technologies and analyze the pros and cons of each approach. We explore these options along the dimensions of information definition, information storage, the access to this information, and finally the maintenance of shared information.

Download Full-text

Ontology-based Document Spanning Systems for Information Extraction

International Journal of Semantic Computing ◽

10.1142/s1793351x20400012 ◽

2020 ◽

Vol 14 (01) ◽

pp. 3-26

Author(s):

Domenico Lembo ◽

Federico Maria Scafoglieri

Keyword(s):

Information Extraction ◽

Relational Databases ◽

Description Logics ◽

Data Access ◽

Data Interpretation ◽

Semantic Integration ◽

Free Text ◽

Text Documents ◽

Other Information ◽

Data Layer

Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that the extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings. However, the data layer considered so far in OBDA has consisted essentially of relational databases, and how to declaratively couple an ontology with unstructured data sources is still unexplored. By leveraging the recent study on document spanners for rule-based IE by Fagin et al., in this paper, we propose a new framework that allows to map text documents to ontologies, in the spirit of OBDA. We investigate the problem of answering conjunctive queries in this framework. For ontologies specified in the Description Logics [Formula: see text] and [Formula: see text], we show that the problem is polynomial in the size of the underlying documents. We also provide algorithms to solve query answering by rewriting the input query on the basis of the ontology and its mapping toward the source documents. Through these techniques, we pursue a virtual approach, similar to that typically adopted in OBDA, which allows us to answer a query without having to first populate the entire ontology. Interestingly, for [Formula: see text], both the spanners used in the mapping and the one computed by the rewriting algorithm belong to the same class of expressiveness. This holds also for [Formula: see text], modulo some limitations on the form of the mapping. These results say that in these cases our framework can be easily implemented by decoupling ontology management and document access, which can be delegated to an external IE system able to process the extraction rules we use in the mapping.

Download Full-text

Privacy Measures for Free Text Documents: Bridging the Gap between Theory and Practice

Trust, Privacy and Security in Digital Business - Lecture Notes in Computer Science ◽

10.1007/978-3-642-22890-2_14 ◽

2011 ◽

pp. 161-173 ◽

Cited By ~ 2

Author(s):

Liqiang Geng ◽

Yonghua You ◽

Yunli Wang ◽

Hongyu Liu

Keyword(s):

Theory And Practice ◽

Free Text ◽

Text Documents ◽

Privacy Measures

Download Full-text

Identification of related information of interest across free text documents

Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics ◽

10.1109/isi.2011.5984058 ◽

2011 ◽

Cited By ~ 6

Author(s):

James R. Johnson ◽

Anita Miller ◽

Latifur Khan ◽

Bhavani Thuraisingham ◽

Murat Kantarcioglu

Keyword(s):

Free Text ◽

Text Documents ◽

Related Information

Download Full-text

APPLICATION OF SYSTEMS APPROACH TO AUTOMATIC INTERPRETATION OF FREE TEXT DOCUMENTS

Systems Approaches in Computer Science and Mathematics ◽

10.1016/b978-0-08-027202-3.50060-2 ◽

1981 ◽

pp. 2387-2391

Author(s):

W. Delaney ◽

L. Guerriero ◽

C. Mero ◽

E. Cannavò ◽

E. Vaccari

Keyword(s):

Systems Approach ◽

Free Text ◽

Text Documents ◽

Automatic Interpretation

Download Full-text

Full-Text Search Engines for Databases

Database Technologies ◽

10.4018/978-1-60566-058-5.ch053 ◽

2009 ◽

pp. 931-939

Author(s):

László Kovács ◽

Domonkos Tikk

Keyword(s):

Information Retrieval ◽

Full Text ◽

Search Engines ◽

Free Text ◽

Text Search ◽

Text Documents ◽

Full Text Search ◽

Textual Data ◽

Efficient Information ◽

Particular Solution

Current databases are able to store several Tbytes of free-text documents. The main purpose of a database from the user’s viewpoint is the efficient information retrieval. In the case of textual data, information retrieval mostly concerns the selection and the ranking of documents. We present here the particular solution of Oracle; there for making the full-text querying more efficient, a special engine was developed that performs the preparation of full-text queries and provides a set of language and semantic specific query operators.

Download Full-text