scholarly journals The CQC Algorithm: Cycling in Graphs to Semantically Enrich and Enhance a Bilingual Dictionary

2012 ◽  
Vol 43 ◽  
pp. 135-171 ◽  
Author(s):  
T. Flati ◽  
R. Navigli

Bilingual machine-readable dictionaries are knowledge resources useful in many automatic tasks. However, compared to monolingual computational lexicons like WordNet, bilingual dictionaries typically provide a lower amount of structured information, such as lexical and semantic relations, and often do not cover the entire range of possible translations for a word of interest. In this paper we present Cycles and Quasi-Cycles (CQC), a novel algorithm for the automated disambiguation of ambiguous translations in the lexical entries of a bilingual machine-readable dictionary. The dictionary is represented as a graph, and cyclic patterns are sought in the graph to assign an appropriate sense tag to each translation in a lexical entry. Further, we use the algorithm's output to improve the quality of the dictionary itself, by suggesting accurate solutions to structural problems such as misalignments, partial alignments and missing entries. Finally, we successfully apply CQC to the task of synonym extraction.

Literator ◽  
2016 ◽  
Vol 37 (1) ◽  
Author(s):  
Ketiwe Ndhlovu

The development of African languages into languages of science and technology is dependent on action being taken to promote the use of these languages in specialised fields such as technology, commerce, administration, media, law, science and education among others. One possible way of developing African languages is the compilation of specialised dictionaries (Chabata 2013). This article explores how parallel corpora can be interrogated using a bilingual concordancer (ParaConc) to extract bilingual terminology that can be used to create specialised bilingual dictionaries. An English–Ndebele Parallel Corpus was used as a resource and through ParaConc, an alphabetic list was compiled from which headwords and possible translations were sought. These translations provided possible terms for entry in a bilingual dictionary. The frequency feature and ‘hot words’ tool in ParaConc were used to determine the suitability of terms for inclusion in the dictionary and for identifying possible synonyms, respectively. Since parallel corpora are aligned and data are presented in context (Key Word in Context), it was possible to draw examples showing how headwords are used. Using this approach produced results quickly and accurately, whilst minimising the process of translating terms manually. It was noted that the quality of the dictionary is dependent on the quality of the corpus, hence the need for creating a representative and clean corpus needs to be emphasised. Although technology has multiple benefits in dictionary making, the research underscores the importance of collaboration between lexicographers, translators, subject experts and target communities so that representative dictionaries are created.


Author(s):  
Arbi Haza Nasution ◽  
Yohei Murakami ◽  
Toru Ishida

Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for the closely related ones, it has been shown that the constraint-based approach is useful for inducing bilingual lexicons from two bilingual dictionaries via the pivot language. However, if there are no available machine-readable dictionaries as input, we need to consider manual creation by bilingual native speakers. To reach a goal of comprehensively create multiple bilingual dictionaries, even if we already have several existing machine-readable bilingual dictionaries, it is still difficult to determine the execution order of the constraint-based approach to reducing the total cost. Plan optimization is crucial in composing the order of bilingual dictionaries creation with the consideration of the methods and their costs. We formalize the plan optimization for creating bilingual dictionaries by utilizing Markov Decision Process (MDP) with the goal to get a more accurate estimation of the most feasible optimal plan with the least total cost before fully implementing the constraint-based bilingual lexicon induction. We model a prior beta distribution of bilingual lexicon induction precision with language similarity and polysemy of the topology as and parameters. It is further used to model cost function and state transition probability. We estimated the cost of all investment plans as a baseline for evaluating the proposed MDP-based approach with total cost as an evaluation metric. After utilizing the posterior beta distribution in the first batch of experiments to construct the prior beta distribution in the second batch of experiments, the result shows 61.5% of cost reduction compared to the estimated all investment plans and 39.4% of cost reduction compared to the estimated MDP optimal plan. The MDP-based proposal outperformed the baseline on the total cost.


Author(s):  
М.А. Дударенко

Предлагается многоязычная вероятностная тематическая модель, одновременно учитывающая двуязычный словарь и связи между документами параллельной или сравнимой коллекции. Для комбинирования этих двух видов информации применяется аддитивная регуляризация тематических моделей (ARTM). Предлагаются два способа использования двуязычного словаря: первый учитывает только сам факт связи между словами--переводами, во втором настраиваются вероятности переводов в каждой теме. Качество многоязычных моделей измеряется на задаче кросс-язычного поиска, когда запросом является документ на одном языке, а поиск производится среди документов другого языка. Показано, что комбинированный учет слов--переводов из двуязычного словаря и связанных документов улучшает качество кросс-язычного поиска по сравнению с моделями, использующими только один тип информации. Сравнение разных методов включения в модель двуязычных словарей показывает, что оценивание вероятностей переводов не только улучшает качество модели, но и позволяет находить тематический контекст для пар слово--перевод. A multilingual probabilistic topic model based on the additive regularization ARTM allowing to combine both a parallel or comparable corpus and a bilingual translation dictionary is proposed. Two approaches to include information from a bilingual dictionary are discussed: the first one takes into account only the fact of connection between word translations, whereas the second one learns the translation probabilities for each topic. To measure the quality of the proposed multilingual topic model, a cross-language search is performed. For each query document in one language, it is found its translation on another language. It is shown that the combined translation of words from a bilingual dictionary and the corresponding connected documents improves the cross-lingual search compared to the models using only one information source. The use of learning word translation probabilities for bilingual dictionaries improves the quality of the model and allows one to determine a context (a set of topics) for each pair of word translations, where these translations are appropriate.


Author(s):  
Martina Nied Curcio

AbstractMisunderstandings between speakers of different languages occur not only on a linguistic level but also on a cultural one. Consultation of a bilingual dictionary does not necessarily help in this case, as information on the cultural level is often missing. In this paper we will discuss how bilingual dictionaries can draw attention to cultural divergences so that the dictionary user acquires cultural knowledge and is able to build an intercultural competence. Examples from four bilingual dictionaries (German-Italian) are given to illustrate how culture-bound words are represented. For this purpose, a classification of culture-bound words is offered. Finally, the prerequisites and possibilities of an appropriate representation of culture-bound items in bilingual dictionaries will be proposed.


2016 ◽  
Vol 36 (1) ◽  
pp. 147
Author(s):  
Beatriz Sánchez Cárdenas ◽  
Pamela Faber

http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p147Research in terminology has traditionally focused on nouns. Considerably less attention has been paid to other grammatical categories such as adverbs. However, these words can also be problematic for the novice translator, who tends to use the translation correspondences in bilingual dictionaries without realizing that formal equivalence is not necessarily the same as textual equivalence. However, semantic values, acquired in context, go far beyond dictionary meaning and are related to phenomena such as semantic prosody and preferences of lexical selection that can vary, depending on text type and specialized domain.This research explored the reasons why certain adverbial discourse connectors, apparently easy to translate, are a source of translation problems that cannot be easily resolved with a bilingual dictionary. Moreover, this study analyzed the use of parallel corpora in the translation classroom and how it can increase the quality of text production. For this purpose, we compared student translations before and after receiving training on the use of corpus analysis tools


2018 ◽  
Vol 8 (3) ◽  
pp. 71-74
Author(s):  
B. Vasantha ◽  
B. M. Meera ◽  
M. Dhanamjaya

Tremendous advancement in Information and Communication Technology has its impact on all walks of life. The advent of Internet and the World Wide Web has particularly impacted Library and Information domain. Library and Information Centers today play an important role in enhancing the quality of academic environment and influence basic and core activity of the research centers. They help the users to identify and access the variety knowledge resources in different formats such as electronic information resources in academic institutions. The purpose of this paper to it understands the usage pattern of electronic information resources in an academic institute by the research scholars. A survey method is adopted to know the frequency of use, level of satisfaction with different resources, and the problems encountered while using electronic information resources at REVA University, Bengaluru.


Author(s):  
Yang Zhao ◽  
Jiajun Zhang ◽  
Yu Zhou ◽  
Chengqing Zong

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.


2021 ◽  
Author(s):  
Ender Mehmet Şahinkoç ◽  
Türker Tuğsal

The aim of this research is to analyze human capital, natural resources, and technological developments, which are the determinants of growth. The use of these resources has been evaluated with country data sets. In this research, the sources of growth have been examined; moreover, growth performance of Turkey has also been analyzed. Human capital can be considered as a prominent factor among the sources of growth. In this context, studies on measuring the quality of human capital have been examined. The use of natural resources and technology for economic growth is one of the areas examined in this research. Furthermore, literature has been reiterated and the concept of growth has been explained theoretically with traditional and modern aspects. In order to measure "human capital" which is one of the sources of growth; Turkish Statistical Institution (TUIK) and the World Bank data have been used. In the tables, the data that show the education level develeopment in Turkey are presented over the years. Besides, Turkey's potential growth rate and annual growth rates have been evaluated. To conclude, the growth performance of Turkey has been evaluated and the importance of increasing the quality of human capital is emphasized. Recommendations have been made to increase potential growth and ensure sustainable growth. Some structural problems have been identified as a result of examining the development of the Turkish economy over the years. These structural problems have been addressed and solutions investigated.


2019 ◽  
Vol 19 (4) ◽  
pp. 99-107
Author(s):  
Larisa Alimpieva ◽  

In the process of communicative act Russian particles concurrently fulfil different functions. It makes Russian particles an important unit of functional-pragmatic sphere of the Russian language which is characterized by its national specifics and connotativity. The problem of codification of Russian particles in bilingual lexicography is complicated. The main problem at compiling a dictionary lemma is filiation (division of meanings) of Russian particles and their rendering by lexical means of a foreign language. The existing lexicographic descriptions of Russian particles in bilingual dictionaries irrelevantly reflect the structure and contents of their meanings. The aim of the article is to consider some theoretical problems of description of Russian particles by means of a second (target) language in dictionary lemmas of bilingual dictionaries.


Sign in / Sign up

Export Citation Format

Share Document