Cognitive Aspects Of Electronic Text Processing, Vol. LVIII In The Series Advances In Discourse Processes[Book Review]

1998 ◽  
Vol 41 (1) ◽  
pp. 77-79
Author(s):  
J.C. Redish
Author(s):  
Andrei Mikheev

Electronic text is essentially just a sequence of characters, but the majority of text processing tools operate in terms of linguistic units such as words and sentences. Tokenization is a process of segmenting text into words, and sentence splitting is the process of determining sentence boundaries in the text. In this chapter we describe major challenges for text tokenization and sentence splitting in different languages, and outline various computational approaches to tackling them.


2004 ◽  
Vol 36 (4) ◽  
pp. 539-548
Author(s):  
Elizabeth Dutro
Keyword(s):  

Author(s):  
Meftah Mohammed Charaf Eddine

In the field of machine translation of texts, the ambiguity in both lexical (dictionary) and structural aspects is still one of the difficult problems. Researchers in this field use different approaches, the most important of which is machine learning in its various types. The goal of the approach that we propose in this article is to define a new concept of electronic text, which makes the electronic text free from any lexical or structural ambiguity. We used a semantic coding system that relies on attaching the original electronic text (via the text editor interface) with the meanings intended by the author. The author defines the meaning desired for each word that can be a source of ambiguity. The proposed approach in this article can be used with any type of electronic text (text processing applications, web pages, email text, etc.). Thanks to the approach that we propose and through the experiments that we have conducted using it, we can obtain a very high accuracy rate. We can say that the problem of lexical and structural ambiguity can be completely solved. With this new concept of electronic text, the text file contains not only the text but also with it the true sense of the exact meaning intended by the writer in the form of symbols. These semantic symbols are used during machine translation to obtain a translated text completely free of any lexical and structural ambiguity.


Automatica ◽  
1996 ◽  
Vol 32 (3) ◽  
pp. 303-304
Author(s):  
Huibert Kwakernaak

Terminology ◽  
1998 ◽  
Vol 5 (2) ◽  
pp. 147-159 ◽  
Author(s):  
Johann Gamper ◽  
Oliviero Stock

The manual acquisition of terminological material from the domain-specific text material is a very time-consuming task. Recent advances in text-processing research provide a basis for automating this task. Computer-assisted term acquisition improves both the quantity and the quality of terminological work. This paper gives a brief overview of this new approach in terminology acquisition. Three subtasks are distinguished: compilation of an electronic text corpus, extraction of terminological data, and management of terminological data. Each of the subtasks will be discussed in some detail by identifying the core problems as well as proposed solutions. As a concrete initiative in this emerging field, we present an ongoing research project at the European Academy Bolzano, which illustrates the importance of computer-assisted terminology acquisition and of the resulting steps that have been taken in recent times. The paper concludes with a summary of five selected papers which have been presented at a workshop on corpus-based terminology in Bolzano. The full papers are published in this volume and in volume 4(2) of this journal.


Sign in / Sign up

Export Citation Format

Share Document