General-purpose statistical translation engine and domain specific texts

Terminology ◽  
2004 ◽  
Vol 10 (1) ◽  
pp. 131-153 ◽  
Author(s):  
Philippe Langlais ◽  
Michael Carl

The past decade has witnessed exciting work in the field of Statistical Machine Translation (SMT). However, accurate evaluation of its potential in real-life contexts is still an open question. In this study, we investigate the behavior of an SMT engine faced with a corpus far different from the one it has been trained on. We show that terminological databases are obvious resources that should be used to boost the performance of a statistical engine. We propose and evaluate one way of integrating terminology into a SMT engine which yields a significant reduction in word error rate.

2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Longyue Wang ◽  
Derek F. Wong ◽  
Lidia S. Chao ◽  
Yi Lu ◽  
Junwen Xing

Data selection has shown significant improvements in effective use of training data by extracting sentences from large general-domain corpora to adapt statistical machine translation (SMT) systems to in-domain data. This paper performs an in-depth analysis of three different sentence selection techniques. The first one is cosine tf-idf, which comes from the realm of information retrieval (IR). The second is perplexity-based approach, which can be found in the field of language modeling. These two data selection techniques applied to SMT have been already presented in the literature. However, edit distance for this task is proposed in this paper for the first time. After investigating the individual model, a combination of all three techniques is proposed at both corpus level and model level. Comparative experiments are conducted on Hong Kong law Chinese-English corpus and the results indicate the following: (i) the constraint degree of similarity measuring is not monotonically related to domain-specific translation quality; (ii) the individual selection models fail to perform effectively and robustly; but (iii) bilingual resources and combination methods are helpful to balance out-of-vocabulary (OOV) and irrelevant data; (iv) finally, our method achieves the goal to consistently boost the overall translation performance that can ensure optimal quality of a real-life SMT system.


Machine Translation systems are still far from being perfect and to improve their performance the concept of Interactive Machine Translation (IMT) was introduced. This paper proposes an IMT system, which uses Statistical Machine Translation and a bilingual corpus on which several algorithms (Word error rate, Position Independent Error Rate, Translation Error Rate, n-grams) are implemented to translate text from English to Indian languages. The proposed system improves both the speed and productivity of the human translators as found through experiments.


Author(s):  
Nicolás José Fernández-Martínez ◽  
Carlos Periñán-Pascual

Location-based systems require rich geospatial data in emergency and crisis-related situations (e.g. earthquakes, floods, terrorist attacks, car accidents or pandemics) for the geolocation of not only a given incident but also the affected places and people in need of immediate help, which could potentially save lives and prevent further damage to urban or environmental areas. Given the sparsity of geotagged tweets, geospatial data must be obtained from the locative references mentioned in textual data such as tweets. In this context, we introduce nLORE (neural LOcative Reference Extractor), a deep-learning system that serves to detect locative references in English tweets by making use of the linguistic knowledge provided by LORE. nLORE, which captures fine-grained complex locative references of any type, outperforms not only LORE, but also well-known general-purpose or domain-specific off-the-shelf entity-recognizer systems, both qualitatively and quantitatively. However, LORE shows much better runtime efficiency, which is especially important in emergency-based and crisis-related scenarios that demand quick intervention to send first responders to affected areas and people. This highlights the often undervalued yet very important role of rule-based models in natural language processing for real-life and real-time scenarios.


2018 ◽  
Vol 27 (2) ◽  
pp. 71-85
Author(s):  
Eri Shigematsu

Daniel Defoe’s fictional autobiographies represent the life of an individual through personal memories. Although he has often been associated with circumstantial realism rather than psychological realism, Defoe in fact represents the psychological as well as social and economic realities of his characters. In Defoe’s first-person autobiographical narratives, the person who narrates (i.e. the narrating self) and the one who experiences (i.e. the experiencing self) share the same pronoun, ‘I’, which exhibits a fluctuating internal tension between the two selves. This article aims to investigate Defoe’s psychological realism in terms of this internal tension, focusing on the narrative techniques for representing consciousness in which the points of view of the two selves are mingled. The representation of consciousness by means of what is called free indirect speech and thought (FIST) is under development in the early eighteenth century. In Defoe’s fictions, however, the internal tension between the two selves is abundantly indicated by his use of FIST and his handling of directness (the-experiencing-self-oriented deictic and expressive elements) within indirect representations of consciousness (indirect speech and thought (IST) and narrator’s representation of speech/thought act (NRSA/TA)) and within narration (N). The analysis demonstrates that, like FIST, direct elements used in indirect consciousness representation categories show the narrating self’s empathetic identification with the past self, which simultaneously evokes the reader’s empathetic feelings towards the psychology of the experiencing self. It consequently reveals that the creation of empathetic effects through directness helped Defoe to represent the psyches of individuals in remembering as in real life.


enadakultura ◽  
2021 ◽  
Author(s):  
Mari Simonishvili

What creates Givi Margvelashvili's work? "In the language of aesthetics, this is called an artistic game, in the language of the heart, it creates an boundless thirst for goodness, which, if it is not satisfied in real life, if it can not eliminate violence here, spreads its wings in the world of books" (Margvelashvili 2018: 18). Changing the conditioned story with a literary game - this is the starting concept of the German-speaking Georgian author and "at the core of his poetics is an attempt to return man to his original, fundamental state - the openness of the world," writes Margvelashvili's book "Life in Ontotext" Das Leben im ") German editor" (Margvelashvili 2018: 11).Givi Margvelashvili is a victim of two dictatorships, Nazism and Communism. He started writing at the age of 30, when after leaving the Saxenhausen concentration camp, he found himself in a completely foreign environment, in his historical homeland, and his aunt's family was connected to his old life with only German. Later, when writing about his own identity, the writer always emphasized the fact that the German language is his linguistic homeland (emphasis add lexo doreuli). "From the past, only language was selected for him, language was a living part of a deprived life, which no one could take away except time. At times, however, his memory and talent met with unprecedented resistance. This is how it became a living island of the German language in the Georgian environment and in a huge prison, on this doubly lonely island the Georgian-German built a huge oil rig of freedom with ascetic loneliness and hard work ”(Margvelashvili 2018: 216).As we know, the writer was sick earlier, the boy brought up under the supervision of German nannies did not understand the Georgian language and essentially this aspect of his life should have become a feature of fate - "he was not bothered by a wordless, internal deal with censorship. Locked in complete solitude with his characters, unknown, he experienced the joy that comes with complete freedom of expression: he wrote as he wanted ”(Margvelashvili 2018: 218). On the one hand, working on German-language literature, and on the other hand, the literary disagreement that Margvelashvili showed against the current regime, increasingly formed the basis for saying that "language and theme choose the writer" (emphasis added Naira Gelashvili) and not vice versa. That was why his characters, the inhabitants of his inaccessible book world, had to meet the reader in a new reality.This other reality was the book "New America" ​​discovered by Givi Margvelashvili (emphasis added by Naira Gelashvili). He is the hero of this book and he is looking forward to the visit of a real person (reader) between the two covers, he even says: "Once the door of your house is opened ... and write a poem of your own" (Margvelashvili 2018: 110). And thus in a one-room apartment the lone author creates a new reality in which the stories take on a grotesque look and the reader is also entangled in a dizzying labyrinth of fantasy.


2015 ◽  
Vol 54 ◽  
pp. 159-192
Author(s):  
Lluís Formiga ◽  
Alberto Barrón-Cedeño ◽  
Lluís Màrquez ◽  
Carlos A. Henríquez ◽  
José B. Mariño

In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.


2009 ◽  
Vol 3 (2) ◽  
Author(s):  
A. Agovic

Over the past year we have studied the challenges that must be overcome before we can introduce assistive robots in an operating room. We consider top among the issues a human-robot interface and an instrument-robot interface. In order for an autonomous mechanism to serve up instruments it must have domain specific knowledge about the instrument nature. The robot must be able to track the state of each instrument under its management. To this end we examine technical requirements of an instrument server. The second area of interest, and the one more unpredictable, is the problem of interaction between a human and a machine. In the past we have looked at the human speech as a medium of communication with the robot. Going beyond that we also examine the interaction that occurs at the haptic level. Here we would like to know what precisely could be conveyed to the robot and frmo the robot just by a touch? In microscope is undesirable and touch becomes a valuable means of communication.


Author(s):  
Hu Xu ◽  
Bing Liu ◽  
Lei Shu ◽  
Philip S. Yu

Learning high-quality domain word embeddings is important for achieving good performance in many NLP tasks. General-purpose embeddings trained on large-scale corpora are often sub-optimal for domain-specific applications. However, domain-specific tasks often do not have large in-domain corpora for training high-quality domain embeddings. In this paper, we propose a novel lifelong learning setting for domain embedding. That is, when performing the new domain embedding, the system has seen many past domains, and it tries to expand the new in-domain corpus by exploiting the corpora from the past domains via meta-learning. The proposed meta-learner characterizes the similarities of the contexts of the same word in many domain corpora, which helps retrieve relevant data from the past domains to expand the new domain corpus. Experimental results show that domain embeddings produced from such a process improve the performance of the downstream tasks.


Author(s):  
Andri Setyorini ◽  
Niken Setyaningrum

Background: Elderly is the final stage of the human life cycle, that is part of the inevitable life process and will be experienced by every individual. At this stage the individual undergoes many changes both physically and mentally, especially setbacks in various functions and abilities he once had. Preliminary study in Social House Tresna Wreda Yogyakarta Budhi Luhur Units there are 16 elderly who experience physical immobilization. In the social house has done various activities for the elderly are still active, but the elderly who experienced muscle weakness is not able to follow the exercise, so it needs to do ROM (Range Of Motion) exercise.   Objective: The general purpose of this research is to know the effect of Range Of Motion (ROM) Active Assitif training to increase the range of motion of joints in elderly who experience physical immobility at Social House of Tresna Werdha Yogyakarta unit Budhi Luhur.   Methode: This study was included in the type of pre-experiment, using the One Group Pretest Posttest design in which the range of motion of the joints before (pretest) and posttest (ROM) was performed  ROM. Subjects in this study were all elderly with impaired physical mobility in Social House Tresna Wreda Yogyakarta Unit Budhi Luhur a number of 14 elderly people. Data analysis in this research use paired sample t-test statistic  Result: The result of this research shows that there is influence of ROM (Range of Motion) Active training to increase of range of motion of joints in elderly who experience physical immobility at Social House Tresna Wredha Yogyakarta Unit Budhi Luhur.  Conclusion: There is influence of ROM (Range of Motion) Active training to increase of range of motion of joints in elderly who experience physical immobility at Social House Tresna Wredha Yogyakarta Unit Budhi Luhur.


2011 ◽  
Vol 13 (2) ◽  
pp. 201-171
Author(s):  
Nāṣir Al-Dīn Abū Khaḍīr

The ʿUthmānic way of writing (al-rasm al-ʿUthmānī) is a science that specialises in the writing of Qur'anic words in accordance with a specific ‘pattern’. It follows the writing style of the Companions at the time of the third caliph, ʿUthmān b. ʿAffān, and was attributed to ʿUthmān on the basis that he was the one who ordered the collection and copying of the Qur'an into the actual muṣḥaf. This article aims to expound on the two fundamental functions of al-rasm al-ʿUthmānī: that of paying regard to the ‘correct’ pronunciation of the words in the muṣḥaf, and the pursuit of the preclusion of ambiguity which may arise in the mind of the reader and his auditor. There is a further practical aim for this study: to show the connection between modern orthography and the ʿUthmānic rasm in order that we, nowadays, are thereby able to overcome the problems faced by calligraphers and writers of the past in their different ages and cultures.


Sign in / Sign up

Export Citation Format

Share Document