scholarly journals An open stylometric system based on multilevel text analysis

Author(s):  
Maciej Eder ◽  
Maciej Piasecki ◽  
Tomasz Walkowiak

An open stylometric system based on multilevel text analysisStylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications. Otwarty system stylometryczny wykorzystujący wielopoziomową analizę języka Zastosowania metod stylometrycznych na ogół ograniczają się do kilku typowych problemów badawczych, takich jak atrybucja autorska, styl gatunków literackich czy studia nad zróżnicowaniem stylistycznym kobiet i mężczyzn. Z pewnością dałoby się je z powodzeniem zastosować również do wielu innych problemów klasyfikacji tekstów, gdyby tylko owe metody oraz odpowiednie narzędzia były bardziej dostępne dla uczonych reprezentujących różne dyscypliny nauk humanistycznych i społecznych. Artykuł niniejszy omawia założenia teoretyczne oraz w pełni funkcjonalny prototyp otwartego systemu stylometrycznego, którego szerokie zastosowanie umożliwią dwie jego cechy: elastyczność techniczna oraz dostosowywalność do różnych pytań badawczych. System opiera się na instalacji serwerowej sprzęgniętej z sieciowym interfejsem użytkownika. Uwalnia to użytkownika od konieczności instalowania jakichkolwiek dodatkowych programów. Jednocześnie system oferuje wiele sposobów analizowania tekstów nie tylko na poziomie leksykalnym, lecz także poprzez cechy językowe niskiego poziomu. Daje to możliwość stosowania systemu na wiele różnych sposobów, od typowych testów stylometrycznych do analizy semantycznej dokumentów. Wewnętrzna architektura systemu składa się z wielu elementów znanych ze swej funkcjonalności, w tym z pakietu Stylo przeznaczonego do analiz stylometrycznych oraz pakietu Cluto służącego do zaawansowanej analizy skupień. Artykuł omawia: (1) Koncepcję całego systemu, postrzeganą z punktu widzenia użytkownika, (2) Architekturę systemu oraz jego elementy odpowiedzialne za przetwarzanie tekstu, (3) Cechy językowe służące do opisu dokumentów, (4) Zastosowanie modułów analizy danych, takich jak Stylo czy Cluto. W artykule zostały też przedstawione przykładowe zastosowania systemu.

Author(s):  
Natalie Shapira ◽  
Gal Lazarus ◽  
Yoav Goldberg ◽  
Eva Gilboa-Schechtman ◽  
Rivka Tuval-Mashiach ◽  
...  

Heritage ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 612-640
Author(s):  
Nikolaos Partarakis ◽  
Danai Kaplanidi ◽  
Paraskevi Doulgeraki ◽  
Effie Karuzaki ◽  
Argyro Petraki ◽  
...  

This paper presents a knowledge representation framework and provides tools to allow the representation and presentation of the tangible and intangible dimensions of culinary tradition as cultural heritage including the socio-historic context of its evolution. The representation framework adheres to and extends the knowledge representation standards for the Cultural Heritage (CH) domain while providing a widely accessible web-based authoring environment to facilitate the representation activities. In strong collaboration with social sciences and humanities, this work allows the exploitation of ethnographic research outcomes by providing a systematic approach for the representation of culinary tradition in the form of recipes, both in an abstract form for their preservation and in a semantic representation of their execution captured on-site during ethnographic research.


2021 ◽  
Author(s):  
César E. Montiel Olea ◽  
Leonardo R. Corral

Project Completion Reports (PCRs) are the main instrument through which different multilateral organizations measure the success of a project once it closes. PCRs are important for development effectiveness as they serve to understand achievements, failures, and challenges within the project cycle they can feed back into the design and execution of new projects. The aim of this paper is to introduce text analysis tools for the exploration of PCR documents. We describe and apply different text analysis tools to explore the content of a sample of PCRs. We seek to illustrate a way in which PCRs can be summarized and analyzed using innovative tools applied to a unique dataset. We believe that the methods presented in this investigation have numerous potential applications to different types of text documents routinely prepared within the Inter-American Development Bank (IDB).


Author(s):  
Ángela Almela ◽  
Gema Alcaraz-Mármol ◽  
Arancha García-Pinar ◽  
Clara Pallejá

In this paper, the methods for developing a database of Spanish writing that can be used for forensic linguistic research are presented, including our data collection procedures. Specifically, the main instrument used for data collection has been translated into Spanish and adapted from Chaski (2001). It consists of ten tasks, by means of which the subjects are asked to write formal and informal texts about different topics. To date, 93 undergraduates from Spanish universities have already participated in the study and prisoners convicted of gender-based abuse have participated. A twofold analysis has been performed, since the data collected have been approached from a semantic and a morphosyntactic perspective. Regarding the semantic analysis, psycholinguistic categories have been used, many of them taken from the LIWC dictionary (Pennebaker et al., 2001). In order to obtain a more comprehensive depiction of the linguistic data, some other ad-hoc categories have been created, based on the corpus itself, using a double-check method for their validation so as to ensure inter-rater reliability. Furthermore, as regards morphosyntactic analysis, the natural language processing tool ALIAS TATTLER is being developed for Spanish.  Results shows that is it possible to differentiate non-abusers from abusers with strong accuracy based on linguistic features.


2021 ◽  
pp. 146144482110672
Author(s):  
Nina Savela ◽  
David Garcia ◽  
Max Pellert ◽  
Atte Oksanen

This study grounded on computational social sciences and social psychology investigated sentiment and life domains, motivational, and temporal themes in social media discussions about robotic technologies. We retrieved text comments from the Reddit social media platform in March 2019 based on the following six robotic technology concepts: robot ( N = 3,433,554), AI ( N = 2,821,614), automation ( N = 879,092), bot ( N = 21,559,939), intelligent agent ( N = 15,119), and software agent ( N = 18,324). The comments were processed using VADER and LIWC text analysis tools and analyzed further with logistic regression models. Compared to the other four concepts, robot and AI were used less often in positive context. Comments addressing themes of leisure, money, and future were associated with positive and home, power, and past with negative comments. The results show how the context and terminology affect the emotionality in robotic technology conversations.


Author(s):  
Naifu Zhang ◽  
Xiaohe Yu ◽  
Xinchao Zhang ◽  
Sheena D’Arcy

Abstract Summary Hydrogen–Deuterium eXchange coupled to mass spectrometry is a powerful tool for the analysis of protein dynamics and interactions. Bottom-up experiments looking at deuterium uptake differences between various conditions are the most common. These produce multi-dimensional data that can be challenging to depict in a single visual format. Each user must also set significance thresholds to define meaningful differences and make these apparent in data presentation. To assist in this process, we have created HD-eXplosion, an open-source, web-based application for the generation of chiclet and volcano plots with statistical filters. HD-eXplosion fills a void in available software packages and produces customizable plots that are publication quality. Availability and implementation The HD-eXplosion application is available at http://hd-explosion.utdallas.edu. The source code can be found at https://github.com/HD-Explosion.


Author(s):  
Baguma Asuman ◽  
Md. Shahadat Hossain Khan ◽  
Che Kum Clement

This article reports on the barriers encountered by teachers and the possible solutions to the integration of web-based learning (WBL) into higher educational institutions in Uganda. A total of 50 teachers in the departments of ICT, management, and social sciences from five different universities were purposively selected. A self-designed questionnaire was adapted to collect participants responses. Both quantitative and qualitative methods were used to analyze data. The findings indicate that teachers had a positive attitude to incorporate WBL into teaching and learning process, but they encountered some difficulties which were identified as slow internet speeds, insufficient web-based tools, lack of technical support, etc. It further identified possible enablers to overcome these difficulties and provides empirical evidence of incorporating new knowledge in the existing literature. It also provides recommendations in terms of overcoming difficulties to enhance and incorporate WBL in teaching and learning contexts of higher education in Uganda particularly and developing countries in general


Author(s):  
Fion S.L. Lee ◽  
Kelvin C.K. Wong ◽  
William K.W. Cheung ◽  
Cynthia F.K. Lee

This chapter describes the use of a Web-based essay critiquing system and its integration into in a series of composition workshops for a group of secondary school students in Hong Kong. It begins with a review and application of the hybrid learning approach, followed by a description of latent semantic analysis, a methodology for corpus preparation. Then, the distribution computing architecture for essay critiquing system is described. It explicates the way in which the system is integrated with a writing pedagogy implemented in the workshop and the feasibility evaluation result is derived. The positive result confirms the benefits of hybrid learning.


Sign in / Sign up

Export Citation Format

Share Document