Digital Text Collections, Linguistic Research Data, and Mashups: Notes on the Legal Situation

This paper discusses the advantages of encoded digital text over printed text,from a researcher's perspective. The traditional notion of text corpus as a well-consideredcollection of texts is related to the huge amounts of digital textsthat are currently available on the web. After examples of useful digitalizationinitiatives and available digital resources, information is given about the usersand uses of the text corpora stored at the lnstitute for Dutch Lexicology.Attention is paid to some obstacles in building or using text collections. Theconclusion is that up till now the digital medium primarily facilitates researchrather than evokes new linguistic research questions.

Download Full-text

REPRESENTASI IDEOLOGI DAN KEKUASAAN DALAM BAHASA: KAJIAN TEKS MEDIA

HUMANIKA ◽

10.14710/humanika.22.2.92-102 ◽

2015 ◽

Vol 22 (2) ◽

pp. 92

Author(s):

Suharyo Suharyo ◽

Surono Surono ◽

Mujid F Amin

Keyword(s):

Discourse Analysis ◽

Social Context ◽

Critical Discourse Analysis ◽

Research Data ◽

Social Dimension ◽

Critical Discourse ◽

Research Model ◽

Linguistic Research ◽

The Social ◽

Text Context

This article is based on the assumption that language is not in a social vacuum. Language is more than a set of words that merely linguistic, but also social. Therefore, the current linguistic research should take into account the social dimension in the analysis are critical, such as van Dijk’s critical discourse analysis (CDA) research model. The critical discourse analysis research considering the text, context, social cognition, and analysis/social context. Research steps include: exposing the macro structure (thematic), superstructure (schematic), and microstructure consisting of semantics, syntax, stylistic, and rhetoric. Accordingly, this study uses the method read and record while research data has been collected from Suara Merdeka and Kompas newspaper. Finally concluded that the language represents the ideology and power (symbolic) both individual and communal.

Download Full-text

Data Protection Laws & Ways to conformity

ARID International Journal of Social Sciences and Humanities ◽

10.36772/arid.aijssh.2021.353 ◽

2021 ◽

Author(s):

رضوان اسخيطة

Keyword(s):

Data Protection ◽

Research Method ◽

Research Data ◽

Electronic Contribution ◽

General Data Protection Regulation ◽

Main Research ◽

Development Level ◽

Legal Situation ◽

General Data ◽

Advanced Development

This Research – Data Protection Laws and Ways to Conformity – talked about the data protection laws in Arabic countries and take the general data protection regulation as research main example. The main purpose of that is to show which point has developed the GDPR and to make a comparison with the legal situation in Arabic countries, the UAE was used as the main example of the Arabic countries in this research due to the advanced development level in technical and legal fields.This research uses mainly the books, electronic contribution and online guides as references and use the comparison as main research method. The new points in the European laws and the expanding of GDPR application to every country in specific cases related to the main discussion points in this research. The research comes to the results with recommendations about the best ways to be conform with the protection laws and the features behind this conformity.

Download Full-text

Providing Digital Infrastructure for Audio-Visual Linguistic Research Data with Diverse Usage Scenarios: Lessons Learnt

Publications ◽

10.3390/publications8020033 ◽

2020 ◽

Vol 8 (2) ◽

pp. 33

Author(s):

Hanna Hedeland

Keyword(s):

Linguistic Diversity ◽

Research Data ◽

User Needs ◽

Research Software ◽

Technical Requirements ◽

Digital Infrastructure ◽

Linguistic Research ◽

Management Approaches ◽

The One ◽

Digital Research

This article describes the development of the digital infrastructure at a research data centre for audio-visual linguistic research data, the Hamburg Centre for Language Corpora (HZSK) at the University of Hamburg in Germany, over the past ten years. The typical resource hosted in the HZSK Repository, the core component of the infrastructure, is a collection of recordings with time-aligned transcripts and additional contextual data, a spoken language corpus. Since the centre has a thematic focus on multilingualism and linguistic diversity and provides its service to researchers within linguistics and other disciplines, the development of the infrastructure was driven by diverse usage scenarios and user needs on the one hand, and by the common technical requirements for certified service centres of the CLARIN infrastructure on the other. Beyond the technical details, the article also aims to be a contribution to the discussion on responsibilities and services within emerging digital research data infrastructures and the fundamental issues in sustainability of research software engineering, concluding that in order to truly cater to user needs across the research data lifecycle, we still need to bridge the gap between discipline-specific research methods in the process of digitalisation and generic digital research data management approaches.

Download Full-text

AUTOMATIC RETRIEVAL AND THE FORMALIZATION OF MULTI WORDS EXPRESSIONS WITH F-WORDS IN THE CORPUS OF CONTEMPORARY AMERICAN ENGLISH

Jurnal Humaniora ◽

10.22146/jh.v27i2.8709 ◽

2016 ◽

Vol 27 (2) ◽

pp. 156

Author(s):

Prihantoro Prihantoro

Keyword(s):

American English ◽

Computational Resource ◽

Digital Text ◽

Automatic Retrieval ◽

Text Collections ◽

Research Problems

The research problems in this research are 1) how lexicogrammar takes role in determining polarity of F-Word1 and 2) how to formalize it for corpus processing. The data is obtained from the Contemporary American English Corpus (COCA). In this corpus, F-word is proven to be highest in frequency as compared to its distribution across corpora. Corpus methodology is applied by sending queries to retrieve F-Words to COCA interface. Tokens combination surrounding F-words resulted in the phrase and clause unit accompanying F-words, which are significant cues to determine F-word polarity. The polarity is later proven to be not necessarily negative. I also designed a computational resource to allow the retrieval of F-words offline so that users might apply it to any digital text collections.

Download Full-text

A Comparison of Search Functionalities in Several Tools Used for Searching within Digital Text Collections

Proceedings of the Association for Information Science and Technology ◽

10.1002/pra2.527 ◽

2021 ◽

Vol 58 (1) ◽

pp. 679-681

Author(s):

Liezl H. Ball ◽

Theo J.D. Bothma

Keyword(s):

Digital Text ◽

Text Collections

Download Full-text

JOKE STRATEGIES IN AMERICAN SITUATIONAL COMEDY “HOW I MET YOUR MOTHER”

JURNAL ARBITRER ◽

10.25077/ar.4.1.38-51.2017 ◽

2017 ◽

Vol 4 (1) ◽

pp. 38

Author(s):

Awliya Rahmi

Keyword(s):

Data Analysis ◽

Data Collection ◽

Formal Methods ◽

Dominant Strategy ◽

Research Data ◽

Psychological Defense ◽

Part Of Speech ◽

Linguistic Research ◽

Observation Method

The research discusse joke strategies in American situational comedy How I Met You Mother (HIMYM). The purpose of this research is to identify; (1) the joke strategies in situational comedy HIMYM (2) The pragmatic meaning of jokes that are expressed by the characters in HIMYM AND (3) Pragmatic prank functions that are expressed by the characters in HIMYM. This research is categorized as descriptive linguistic research. Observation method applied in data collection, while the method of distribution and matching applied in analyzing data. The results of this research data analysis is presented using informal and formal methods. From the results of data analysis found 14 strategy joke uttered by characters in a situational comedy American HIMYM, namely: ambiguity, grammar, syllabics, idiomatics, questionable English, antonymics, style, negativism, lexicography, spelling, punctuation, Rhyming English, numerical English And part of speech. The dominant strategy used is ambiguity because there are many words in English that mean more than one and are likely to lead the listener to multiple interpretations. The jokes uttered by the characters in situational comedy HIMYM have assertive, expressive and directive meanings. Moreover, the joke also serves to show the power, solidarity and psychological defense of the speaker.

Download Full-text

The capability of search tools to retrieve words with specific properties from large text collections

10.47989/irisic2030 ◽

2020 ◽

Author(s):

Liezl Ball ◽

◽

Theo Bothma ◽

Keyword(s):

Interface Design ◽

Morphological Data ◽

Digital Text ◽

Retrieval Method ◽

Fine Grained ◽

Search Results ◽

Text Collections ◽

Item Level ◽

Bibliographic Data

Introduction. With the increase in the availability of digital text collections for humanities researchers, tools to enable enhanced retrieval are required. If words with very specific properties could be retrieved from a text collection more accurate linguistic and other analyses can be made. There are a range of properties and metadata that could be specified for retrieval, from morphological data up to bibliographic data. Furthermore, the bibliographic data should not only be on item level but extended to the text-level. For example, in an anthology each section could be encoded with the author of that section. Such extended metadata will enable fine-grained retrieval. Method. In this study, current tools were evaluated to determine to what extent they allow users to retrieve words with specific properties from a text collection. Analysis. The analysis is limited to the following criteria: interface design, metadata, search options, filtering and search results. Results. Currently, it is not possible for a user to retrieve words with specific properties from a text collection. Conclusion. An extended set of metadata should be used to encode text to enable retrieval of words on a fine-grained level.

Download Full-text