A Framework for Automatic Causality Extraction Using Semantic Similarity

Author(s):  
Sanghee Kim ◽  
Rob H. Bracewell ◽  
Ken M. Wallace

Textual documents are the most common way of storing and distributing information within organizations. Extracting useful information from large text collections is therefore the goal of every organization that would like to take advantage of the experience encapsulated in those texts. Entering data using a free text style is easy, as it does not require any special training. However, unstructured texts pose a major challenge for automatic extraction and retrieval systems. Generally, deep levels of text analysis using advanced and complex linguistic processing are necessary that involve computational linguistic experts and domain experts. Linguistic experts are rare in engineering organizations, which thus find it difficult to apply and exploit such advanced extraction techniques. It is therefore desirable to minimize the extensive involvement of linguist experts by learning extraction patterns automatically from example texts. In doing so, the analysis of given texts is necessary in order to identify the scope and suitable automatic methods. Focusing on causality reasoning in the field of fault diagnosis, the results of experimenting with an automatic causality extraction method using shallow linguistic processing are presented.

1977 ◽  
Vol 16 (03) ◽  
pp. 144-153 ◽  
Author(s):  
E. Vaccari ◽  
W. Delaney ◽  
A. Chiesa

A software system for the automatic free-text analysis and retrieval of radiological reports is presented. Such software involves: (1) automatic translation of the specific natural language in a formalized metalanguage in order to transform the radiological report in a »normalized report« analyzable by computer; (2) content processing of the normalized report to select desired information. The approach used to accomplish point (1) is described in detail referring to a specific application.


1991 ◽  
Vol 30 (04) ◽  
pp. 275-283 ◽  
Author(s):  
P. M. Pietrzyk

Abstract:Much information about patients is stored in free text. Hence, the computerized processing of medical language data has been a well-known goal of medical informatics resulting in different paradigms. In Gottingen, a Medical Text Analysis System for German (abbr. MediTAS) has been under development for some time, trying to combine and to extend these paradigms. This article concentrates on the automated syntax analysis of German medical utterances. The investigated text material consists of 8,790 distinct utterances extracted from the summary sections of about 18,400 cytopathological findings reports. The parsing is based upon a new approach called Left-Associative Grammar (LAG) developed by Hausser. By extending considerably the LAG approach, most of the grammatical constructions occurring in the text material could be covered.


Author(s):  
ELSAYED ATLAM

Conventional approaches to text analysis and information retrieval which measured document similarity by considering all information in texts are relatively inefficiency for processing large text collections in heterogeneous subject areas. Previous researches showed that evidence from passage can improve retrieval results. But it also raised questions about how passage is defined, how they can be ranked efficiently, and what is their proper rule in long structure documents. Moreover, the frequency of "the" with important sentence is efficiently to summarize the text by dexterity way. We previously proposed an approach for extracting sentences which including article "the" by some restrict rules to carry out effectiveness passages. Based on previous approaches, this paper presents a new Passage SIMilarity (P-SIM) measurements between documents based on effectiveness passages after extracting them using article "the". Moreover, our new approach showing that this method is more efficient than traditional methods. Also, Recall and Precision are achieved by 92.6% and 97.5% respectively, depending on extracted passages. Furthermore, Recall and Precision significantly improved by 38.3% and 44.2% over the traditional method. The proposed methods are applied to 3,990 articles from the large tagged corpus.


Author(s):  
Wouter van Atteveldt ◽  
Kasper Welbers ◽  
Mariken van der Velden

Analyzing political text can answer many pressing questions in political science, from understanding political ideology to mapping the effects of censorship in authoritarian states. This makes the study of political text and speech an important part of the political science methodological toolbox. The confluence of increasing availability of large digital text collections, plentiful computational power, and methodological innovations has led to many researchers adopting techniques of automatic text analysis for coding and analyzing textual data. In what is sometimes termed the “text as data” approach, texts are converted to a numerical representation, and various techniques such as dictionary analysis, automatic scaling, topic modeling, and machine learning are used to find patterns in and test hypotheses on these data. These methods all make certain assumptions and need to be validated to assess their fitness for any particular task and domain.


1983 ◽  
Vol 6 (5) ◽  
pp. 165-172 ◽  
Author(s):  
F.N. Teskey

In this paper the existing functions of, and a number of future requirements for, information retrieval systems are dis cussed. Two basic requirements for free text information retri eval systems have been identified; one for a more general information modelling language and the other for a simple user interface for complex ad-hoc queries. The paper describes some existing and proposed hardware and software methods for implementing free text information retrieval systems. Emphasis is placed on methods of improving the functionality of the system rather than on methods of increasing the performance. It is suggested that considerable improvements can be achieved by a more imaginative use of existing hardware, though it is realised that special purpose architectures will play an increas ingly important role in information systems. The paper con cludes with a design for a new information retrieval system based on the use of the Binary Relationship Model for infor mation storage and retrieval, and an interactive graphical dis play for the user interface.


Author(s):  
Daniel Paysan ◽  
Luis Haug ◽  
Michael Bajka ◽  
Markus Oelhafen ◽  
Joachim M. Buhmann

AbstractPurpose: Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way. Methods: We use self-supervised training of deep encoder–decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline. Results: Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder–decoder network on the task of predicting the remaining surgery progress. Conclusion: Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete.


Author(s):  
Rebecca Wilson ◽  
Oliver Butters ◽  
Demetris Avraam ◽  
Andrew Turner ◽  
Paul Burton

ABSTRACT ObjectivesDataSHIELD (www.datashield.ac.uk) was born of the requirement in the biomedical and social sciences to co-analyse individual patient data (microdata) from different sources, without disclosing identity or sensitive information. Under DataSHIELD, raw data never leaves the data provider and no microdata or disclosive information can be seen by the researcher. The analysis is taken to the data - not the data to the analysis. Text data can be very disclosive in the biomedical domain (patient records, GP letters etc). Similar, but different, issues are present in other domains - text could be copyrighted, or have a large IP value, making sharing impractical. ApproachBy treating text in an analogous way to individual patient data we assessed if DataSHIELD could be adapted and implemented for text analysis, and circumvent the key obstacles that currently prevent it. ResultsUsing open digitised text data held by the British Library, a DataSHIELD proof-of-concept infrastructure and prototype DataSHIELD functions for free text analysis were developed. ConclusionsWhilst it is possible to analyse free text within a DataSHIELD infrastructure, the challenge is creating generalised and resilient anti-disclosure methods for free text analysis. There are a range of biomedical and health sciences applications for DataSHIELD methods of privacy protected analysis of free text including analysis of electronic health records and analysis of qualitative data e.g. from social media.


Sign in / Sign up

Export Citation Format

Share Document