Conceptual analysis of parallel corpus collected from the Web

2006 ◽  
Vol 57 (5) ◽  
pp. 632-644 ◽  
Author(s):  
Kar Wing Li ◽  
Christopher C. Yang
2009 ◽  
Vol 54 (1) ◽  
pp. 181-188 ◽  
Author(s):  
Tayebeh Mosavi Miangah

Abstract In recent years the exploitation of large text corpora in solving various kinds of linguistic problems, including those of translation, is commonplace. Yet a large-scale English-Persian corpus is still unavailable, because of certain difficulties and the amount of work required to overcome them. The project reported here is an attempt to constitute an English-Persian parallel corpus composed of digital texts and Web documents containing little or no noise. The Internet is useful because translations of existing texts are often published on the Web. The task is to find parallel pages in English and Persian, to judge their translation quality, and to download and align them. The corpus so created is of course open; that is, more material can be added as the need arises. One of the main activities associated with building such a corpus is to develop software for parallel concordancing, in which a user can enter a search string in one language and see all the citations for that string in it and corresponding sentences in the target language. Our intention is to construct general translation memory software using the present English-Persian parallel corpus.


2011 ◽  
Author(s):  
Vivek Kumar Rangarajan Sridhar ◽  
Luciano Barbosa ◽  
Srinivas Bangalore
Keyword(s):  

2003 ◽  
Vol 29 (3) ◽  
pp. 349-380 ◽  
Author(s):  
Philip Resnik ◽  
Noah A. Smith

Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.


2002 ◽  
Vol 7 (1) ◽  
pp. 1-20 ◽  
Author(s):  
Tomaž Erjavec

The paper presents an annotated parallel Slovene-English corpus developed in the scope of the EU ELAN project. The IJS-ELAN corpus was compiled to be a widely distributable dataset for language engineering and for translation and terminology studies. The corpus contains 1 million words from fifteen recent terminology-rich texts. The corpus is sentence aligned and word-tagged with context disambiguated morphosyntactic descriptions and lemmas. These descriptions model simple feature structures, the structure of which is shared between Slovene and English. The corpus is encoded according to the Guidelines for Text Encoding and Interchange and is freely available on the Web for downloading. Additionally, access to IJS-ELAN is available via a powerful Web concordancer.


2021 ◽  
pp. 135481662110357
Author(s):  
Natalia Vila-Lopez ◽  
Inés Küster-Boluda

Sharing economy research has risen exponentially during the last 4 years. Although several theoretical revisions on this topic have been developed, a conceptual analysis based on bibliometric techniques and science mapping tools is lacking. Within this framework, this article has two aims: (i) to carry on a performance analysis to identify the outstanding themes and (ii) to visually present the scientific structure by topics of research in sharing-collaborative economy as well as its evolution to identify future directions. The resources in the Web of Science Citation Index were used. Intelligent techniques and, more specifically, the SciMAT tool (based on co-word analysis and h-index analysis) were applied using a sample of 940 indexed papers from 2010 to 2020 (with 10.652 global citations). Our results show that the new post-pandemic era requires the sharing economy industry to investigate alternative ways: to improve trust, to innovate, to search for authenticity and experiences, to attend tourist motivations based on sustainability, and to use big data and manage overtourism.


Author(s):  
Sal Hagen ◽  
Marc Tuters ◽  
Stijn Peeters ◽  
Emillie De Keulenaar ◽  
Jack Wilson ◽  
...  

This panel brings together research into the cross-platform relations between radical Web subcultures and how they are constitutive of “hyper-antagonistic” politics in broader Web discourses. The papers share a concern with vernacular practices of “fringe” platforms favoured by an insurgent far-right movement and their relations to more “mainstream” social media. They engage with the concept of “transcoding between milieus” (Deleuze & Guattari 1987, 322) as a means to empirically describe multiple transversal processes across different strata of the Web in which “one milieu serves as the basis for another” (313). All papers ground their conceptual analysis in data-driven empirical approaches using historical datasets ranging from “mainstream” platforms like YouTube, to more “fringe” spaces like 4chan. The papers furthermore all use 4chan’s far-right /pol/ board as a reference point for a vernacular “hyper-antagonistic” style that emerged out of this period – a style that has often been related to the “alt-right”. Together, the four papers in this panel offer insights into the apparent insurgency of far-right subcultures within broader online discourse in the Anglo-American context over the course of the last half decade. Each does so with a particular focus, ranging from subcultural conflict between Tumblr and 4chan, the transcoding of the “Kekistan” meme between 4chan and YouTube, the emergence of far-right vernacular in the comments of Breitbart News, and the robustness of hyper-antagonistic discourse after deplatforming measures.


Sign in / Sign up

Export Citation Format

Share Document